When I started building RAG-powered chat, Pinecone was the obvious choice. Managed vector database, good docs, quick to get running. Upload embeddings, query by similarity, get results. It worked.
After a few months I started looking at the bills and the operational footprint.
What Pinecone gave me
Pinecone’s developer experience is solid. Create an index, upsert vectors with metadata, query with a vector and get back ranked results. The SDK handles batching, retries, and pagination.
The search quality was fine. OpenAI’s text-embedding-3-small at 768 dimensions, cosine similarity, top-K retrieval. Documents came back in a reasonable order. The metadata filtering let me scope searches by document type, source, and access level.
Why I moved
Three things pushed me to migrate.
Operational complexity. Pinecone is another service. Another API key in secrets management, another dashboard to monitor, another vendor’s status page to check when things break. For an internal tool, that overhead wasn’t justified.
Data locality. My documents, users, conversations, and message logs all lived in Postgres. The vectors lived in Pinecone. Every search required a round trip to a different service, and joining vector results with relational data meant two queries and application-level merging.
Cost. Pinecone’s pricing is per-vector-per-month. For a growing knowledge base on an internal tool, that adds up. pgvector is free - it’s a Postgres extension.
The migration
pgvector adds a vector column type to Postgres. You store embeddings alongside your other columns, create an index, and query with distance operators.
The search query is a single SQL statement: embed the query, find the nearest vectors using <=> (cosine distance), filter by metadata, return results with similarity scores. No SDK, no API calls, no separate service.
I kept the same embedding model (text-embedding-3-small, 768 dimensions) and the same chunking strategy. The only change was where the vectors lived. (The embedding model itself later moved from OpenAI to a local Ollama model - a separate migration driven by cost.)
What I lost
Pinecone has better tooling for inspecting vectors, testing queries, and monitoring index health. pgvector gives you EXPLAIN ANALYZE and not much else. For debugging retrieval quality, I missed Pinecone’s dashboard.
Pinecone also handles index optimisation automatically. With pgvector, you need to think about index types (IVFFlat vs HNSW), list sizes, and maintenance. For my scale (thousands of vectors, not millions), this didn’t matter. At larger scale, it would.
What I gained
One database. One connection string. One backup strategy. Vectors and metadata in the same transaction. Joins between vector results and relational data in a single query.
For an internal tool with a knowledge base of thousands of documents, pgvector is enough. The retrieval quality is identical - same embeddings, same similarity metric, same results. The operational simplicity is worth more than Pinecone’s better tooling.