Conversational AI: From RAG Prototypes to Domain-Specific SupervisionDecember 12, 2024 The Hidden Cost of Embedding OpenAI's embedding API charges per token across ingestion, re-ingestion, and every query. Switching to a local Ollama model eliminated the recurring cost with comparable retrieval quality.
ragembeddingsllmollama
Read article
Conversational AI: From RAG Prototypes to Domain-Specific SupervisionNovember 28, 2024 Tool-Forced RAG: Stopping the LLM From Making Up Clinical Guidelines LLMs confidently generate plausible clinical advice that doesn't match published standards. Forcing document retrieval for professional questions prevents this.
ragllmsafety
Read article
Conversational AI: From RAG Prototypes to Domain-Specific SupervisionOctober 24, 2024 From Flat Vectors to Graph RAG: When Similarity Search Isn't Enough Vector search finds similar chunks. Graph RAG finds related concepts. The difference matters when questions span multiple topics.
raggraphllm
Read article
Conversational AI: From RAG Prototypes to Domain-Specific SupervisionOctober 15, 2024 Chunk-Then-Summarise: The Embedding Pipeline That Worked Raw PDF chunks make terrible vectors. Summarising each chunk before embedding produced cleaner searches and more relevant retrieval.
ragembeddingsllm
Read article
Conversational AI: From RAG Prototypes to Domain-Specific SupervisionOctober 7, 2024 Pinecone to pgvector: Why I Ditched the Managed Vector DB Pinecone worked, but it was another service to manage, another bill, and another point of failure. pgvector kept vectors next to my relational data in the same Postgres instance.
ragpostgresllm
Read article