Writing tagged: ollama | Daniel John Morris

Conversational AI: From RAG Prototypes to Domain-Specific SupervisionDecember 12, 2024 The Hidden Cost of Embedding OpenAI's embedding API charges per token across ingestion, re-ingestion, and every query. Switching to a local Ollama model eliminated the recurring cost with comparable retrieval quality.

ragembeddingsllmollama