Tagged: rag
5 articles
The Hidden Cost of Embedding OpenAI's embedding API charges per token across ingestion, re-ingestion, and every query. Switching to a local Ollama model eliminated the recurring cost with comparable retrieval quality. Read article Tool-Forced RAG: Stopping the LLM From Making Up Clinical Guidelines LLMs confidently generate plausible clinical advice that doesn't match published standards. Forcing document retrieval for professional questions prevents this. Read article From Flat Vectors to Graph RAG: When Similarity Search Isn't Enough Vector search finds similar chunks. Graph RAG finds related concepts. The difference matters when questions span multiple topics. Read article Chunk-Then-Summarise: The Embedding Pipeline That Worked Raw PDF chunks make terrible vectors. Summarising each chunk before embedding produced cleaner searches and more relevant retrieval. Read article Pinecone to pgvector: Why I Ditched the Managed Vector DB Pinecone worked, but it was another service to manage, another bill, and another point of failure. pgvector kept vectors next to my relational data in the same Postgres instance. Read article