From Flat Vectors to Graph RAG: When Similarity Search Isn't Enough - Writing

Flat vector search works well for simple questions. “What are BACP’s requirements for supervision hours?” - embed the query, find the most similar chunks, done. The answer is in one place and similarity search finds it.

Multi-topic questions break that pattern.

“How do CBT supervision techniques differ from person-centred approaches under BACP guidelines?” spans three topics: CBT supervision, person-centred supervision, and BACP guidelines. The answer isn’t in one chunk. It’s in the relationship between chunks from different documents.

The limits of similarity

Vector similarity measures how close two pieces of text are in embedding space. It’s a proxy for topical relevance. But topical relevance isn’t the same as informational completeness.

A flat search for the CBT/person-centred question returns chunks that are individually similar to the query. Some mention CBT supervision. Some mention person-centred approaches. Some mention BACP. None of them contain the comparison the user is asking for, because no single document makes that comparison.

The LLM then has to synthesise an answer from fragments that happen to be nearby in embedding space but aren’t connected by any explicit relationship.

What graph RAG adds

Graph RAG introduces structure on top of vectors. Documents are nodes. Relationships between documents are edges. Semantic tags categorise documents, and tag co-occurrence reveals topic relationships.

The implementation uses four structures:

Document-to-document links. Explicit connections between documents with typed relationships. A CBT supervision manual links to the BACP supervision framework with a relationship type of “governed by.”

Semantic tags. Each document gets tagged with concepts, scored by confidence. A document about person-centred supervision might be tagged “person-centred” (0.95), “supervision” (0.9), “Rogers” (0.7).

Tag co-occurrence. Tags that frequently appear together on the same documents form their own connections. “CBT” and “supervision” co-occur often, so the graph knows they’re related concepts.

Tag embeddings. Tags themselves have vector embeddings, so you can find semantically similar tags. A query mentioning “client-centred” finds documents tagged “person-centred” even though the exact term doesn’t match.

Search flow

A query now follows multiple paths:

Vector search finds the most similar chunks (same as before)
Tag matching identifies relevant tags from the query
Graph traversal follows document links from initial results to related documents
Tag expansion uses tag embeddings and co-occurrence to find conceptually related documents that vector search missed

The CBT/person-centred question now returns: CBT supervision chunks (from vector search), person-centred supervision chunks (from vector search), BACP framework documents (from graph traversal via “governed by” links), and comparative methodology documents (from tag co-occurrence between “CBT” and “person-centred” tags).

When it matters

Graph RAG’s advantage is most visible on multi-hop questions - queries where the answer requires connecting information from different sources. For single-topic lookups, flat vector search is just as good and simpler.

The admin dashboard includes a graph visualisation that shows how documents, tags, and relationships connect. This turned out to be useful not just for debugging but for understanding what the knowledge base contains and where the gaps are.

Complexity cost

Graph RAG is more complex to build and maintain. You need to define relationship types, assign tags (manually or with LLM assistance), maintain tag embeddings, and tune the balance between vector results and graph results.

For a domain-specific knowledge base with clear relationships between documents - clinical frameworks, supervision standards, therapeutic methodologies - the structure is worth the complexity. For a general-purpose chatbot, flat vectors usually cover the questions users ask.