Microsoft’s GraphRAG system improved answer comprehensiveness by 26% and diversity by 57% compared to standard vector retrieval in their own benchmarks. Those numbers sound like a no-brainer. Then you look at the indexing bill: the same corpus that costs under $5 to embed into a vector database runs $50-200 through GraphRAG’s entity extraction and community summarization pipeline. For a 10,000-document knowledge base, you are looking at a four-figure indexing cost before a single query runs.
That cost gap is why most teams still run vector RAG in production and why the ones that switched to Graph RAG did so for very specific reasons. This post covers what those reasons are, how the four types of Graph RAG differ, and how to adopt graph-based retrieval without torching your budget.
What Graph RAG Is (and What It Is Not)
Standard RAG embeds your documents as vectors and retrieves chunks by cosine similarity. You ask a question, the system finds the five most similar text chunks, and the LLM generates an answer from those chunks. It works well for direct questions over structured knowledge bases. It fails when the answer requires connecting information scattered across multiple documents.
Graph RAG adds a knowledge graph layer between your documents and the retrieval step. Instead of searching for similar text, the system traverses relationships between entities: people, companies, concepts, events. A query about “Which team leads our DACH compliance efforts?” does not depend on whether those exact words appear in a document. The graph knows that Alice manages the Berlin office, the Berlin office handles DACH compliance, and Alice reports to the legal department. Three hops, one answer.
The critical distinction: Graph RAG is not just “RAG with a graph database bolted on.” Microsoft Research’s GraphRAG paper defines it as a system that extracts entities and relationships from source documents, builds a knowledge graph from those extractions, detects communities of related entities, and generates hierarchical summaries of those communities. Queries then route to the appropriate community level rather than scanning raw document chunks.
That architecture enables two things vector RAG cannot do: global queries (“Summarize the main themes across all customer complaints this quarter”) and multi-hop reasoning (“Which suppliers in our network have been flagged for compliance issues by partners who also supply our competitors?”).
Where Vector RAG Still Wins
Not every retrieval problem needs a graph. Vector RAG handles direct factual lookups efficiently: “What is our return policy?” “When does the contract expire?” These queries map cleanly to single chunks. Adding a knowledge graph to this use case adds cost and complexity without improving accuracy. The rule of thumb: if your questions are single-hop and your corpus is well-structured, vector RAG is the right tool.
The Four Types of Graph RAG
Neo4j’s taxonomy identifies four distinct patterns, each with different complexity, cost, and capability trade-offs. Understanding which type you need prevents the common mistake of implementing Microsoft’s full GraphRAG pipeline when a simpler approach would suffice.
Type 1: Graph-Enhanced Vector Search
The simplest form. You keep your existing vector RAG pipeline and add graph metadata as filters or re-rankers. A product search might use vector similarity to find relevant items, then use a product taxonomy graph to boost results from the same category or filter by supplier relationships.
Implementation effort is minimal: add a graph database alongside your vector store, tag your chunks with entity IDs, and use graph-based filters in your retrieval query. No LLM-powered entity extraction required. Tools like LlamaIndex’s Knowledge Graph Index support this pattern out of the box.
Type 2: Graph-Guided Retrieval
The retrieval step itself uses graph traversal. Instead of computing vector similarity, the system starts from a query entity, walks the graph to find related entities and their associated text chunks, and feeds those chunks to the LLM. This handles multi-hop questions that vector search misses entirely.
Neo4j’s implementation combines Cypher graph queries with text chunk retrieval: the agent identifies key entities in the user’s question, queries the graph for connected entities within N hops, retrieves the text chunks associated with those entities, and passes the combined context to the LLM.
Type 3: Graph-Based Summarization (Microsoft GraphRAG)
This is what most people mean when they say “GraphRAG.” Microsoft’s system runs an LLM over your entire corpus to extract entities and relationships, builds a knowledge graph, applies the Leiden community detection algorithm to group related entities, and pre-generates summaries at multiple hierarchy levels. Queries get routed to the appropriate community level: local queries hit specific entity neighborhoods, global queries aggregate across community summaries.
The January 2025 update introduced Dynamic Community Selection, which reduced token usage by 79% while maintaining answer quality. That brought global search costs down significantly, but indexing still requires processing every document through an LLM for entity extraction, which consumes 58% of total indexing tokens.
Type 4: Temporal Knowledge Graphs (Agent Memory)
This is Graph RAG applied to agent memory rather than document retrieval. Systems like Zep’s Graphiti framework and Mem0 build temporal knowledge graphs from agent interactions: every fact gets a timestamp, contradicting facts create explicit supersedes relationships, and retrieval prioritizes the most current and relevant knowledge.
Graphiti, built on Neo4j, uses a hybrid approach: it maintains both a knowledge graph of entities and relationships and an embedding-based similarity index. Queries can combine graph traversal (“What did this customer say about pricing?”) with temporal filtering (“Only facts from the last 30 days”) and semantic similarity (“Topics related to budget concerns”). Mem0 raised $10.9 million specifically to build this type of infrastructure for production agent deployments.
Microsoft GraphRAG vs LightRAG vs Graphiti: The Real Trade-offs
Three frameworks dominate the Graph RAG space, each targeting a different point on the complexity/cost/capability curve.
Microsoft GraphRAG
The heavyweight. Full entity extraction, community detection, hierarchical summarization. With ~14,000 GitHub stars, it is the most researched and benchmarked option.
Strengths: Best-in-class global query performance. The community hierarchy handles “summarize everything about X” queries that no other approach matches. DRIFT search (released late 2024) combines local and global search for 40-60% cost reduction on complex queries.
Weaknesses: Indexing cost is brutal. A 500-page corpus takes roughly 45 minutes and $50-200 at GPT-4 pricing. Re-indexing when documents change means re-running the full pipeline or maintaining incremental update logic that Microsoft is still refining. Production deployments require careful cost monitoring.
LightRAG
The pragmatist’s choice. LightRAG (14,100+ GitHub stars) strips GraphRAG down to its essentials: simpler entity extraction, flat graph structure without community detection, and dual-mode retrieval that combines graph traversal with vector similarity.
Strengths: Indexing the same 500-page corpus takes about 3 minutes and costs roughly $0.50. Quality benchmarks show 70-90% of GraphRAG’s performance at 1/100th the cost. The flat graph structure makes incremental updates straightforward.
Weaknesses: No hierarchical community summarization means global queries (“What are the main themes across all documents?”) perform significantly worse. For narrowly scoped retrieval over well-defined domains, the quality gap is minimal. For broad analytical queries, the gap is real.
Neo4j Graphiti
The agent memory specialist. Graphiti is purpose-built for temporal knowledge graphs that power agent memory, not document Q&A. It integrates natively with Neo4j and supports hybrid graph+vector retrieval.
Strengths: Best temporal reasoning of the three. Explicit fact versioning, relationship invalidation, and time-aware retrieval make it the strongest choice for agents that need to track how knowledge evolves. The Neo4j ecosystem provides mature tooling for graph management, visualization, and querying.
Weaknesses: Requires a Neo4j instance. Not designed for large-scale document ingestion. Best for agent interaction histories and structured knowledge that changes over time, not for static document corpora.
| Feature | Microsoft GraphRAG | LightRAG | Graphiti |
|---|---|---|---|
| Primary use case | Document analysis & summarization | Document Q&A | Agent memory |
| Indexing cost (500 pages) | $50-200 | ~$0.50 | Varies by interaction volume |
| Indexing time (500 pages) | ~45 min | ~3 min | Real-time per interaction |
| Global query support | Excellent | Limited | Not designed for this |
| Multi-hop reasoning | Strong | Moderate | Strong (within agent context) |
| Temporal awareness | None | None | Core feature |
| Incremental updates | Complex | Simple | Native |
The Production Cost Nobody Talks About
The indexing cost comparisons above tell only half the story. Production Graph RAG systems have three cost layers that most evaluations ignore.
Schema Design
The hardest part of Graph RAG is not the framework. It is designing the ontology: which entity types matter, what relationship types to extract, how granular to go. A legal knowledge base might need Contract -> contains -> Clause -> references -> Statute -> interpreted_by -> Precedent. Get the schema wrong and your graph either misses critical connections or drowns in noise.
Microsoft GraphRAG sidesteps this by using LLM-powered open-ended extraction: the model decides what entities and relationships exist. This works for exploration but produces noisy graphs at scale. LightRAG takes a similar approach but with lighter extraction. Production teams at companies like Goldman Sachs and Deloitte that have deployed Graph RAG successfully invested weeks in ontology design before writing any retrieval code.
Graph Maintenance
Documents change. Entities merge, split, or become irrelevant. A vector database handles updates trivially: delete the old vectors, embed the new document, insert. A knowledge graph update requires re-extracting entities from the changed document, reconciling those entities with existing graph nodes (is “Microsoft Corp” the same as “Microsoft”?), updating affected relationships, and re-running community detection if you use GraphRAG’s summarization.
FalkorDB claims sub-millisecond query latency and 600K queries per second, positioning itself as a faster graph database alternative to Neo4j for Graph RAG workloads. But query speed is rarely the bottleneck. Keeping the graph accurate as your corpus evolves is where production teams spend most of their engineering time.
Retrieval Quality Monitoring
Vector RAG failures are relatively uniform: bad results look like wrong chunks. Graph RAG failures are more varied: missing entity extraction, wrong relationship types, stale community summaries, or graph traversal paths that miss relevant nodes. Monitoring requires tracking not just retrieval precision/recall but also entity extraction accuracy, relationship completeness, and graph coverage.
How to Adopt Graph RAG Without Burning Your Budget
The mistake most teams make is jumping straight to Type 3 (Microsoft GraphRAG) because it has the best benchmarks. A smarter path follows the four-type progression.
Start With Type 1: Metadata Enrichment
Take your existing vector RAG pipeline and add graph-based metadata. If you have a product catalog, build a simple taxonomy graph and use it to filter or re-rank vector search results. If you have a customer knowledge base, create an entity graph of customers, products, and interactions, and use it to scope retrieval to the relevant customer context.
This requires no LLM-powered extraction. You can build the graph from structured data you already have: CRM records, product databases, org charts. The cost is the graph database itself (Neo4j Community Edition is free, or use a managed service starting around $65/month).
Graduate to Type 2 When Multi-Hop Matters
Once you have queries that consistently require connecting information across documents, implement graph-guided retrieval. The trigger is specific: you notice that users ask questions like “Which of our suppliers also work with [competitor]?” or “What regulations affect the products we sell in [region]?” that vector search handles poorly.
At this stage, you do need entity extraction from your documents, but you can use smaller models (GPT-4o-mini or Claude Haiku) and focus extraction on the specific entity types your use cases require. Targeted extraction costs a fraction of open-ended extraction.
Consider Type 3 Only for Global Analytics
Microsoft GraphRAG’s community summarization shines for one specific use case: queries that require synthesizing information across your entire corpus. “What are the top five risk factors across all our supplier contracts?” “Summarize the sentiment trends in customer feedback this quarter.” If your users do not ask these kinds of questions, Type 3’s costs are not justified.
If they do, evaluate LightRAG first. At 70-90% of GraphRAG’s quality for 1/100th the cost, it is the right starting point unless your benchmarks specifically show the quality gap matters for your use case.
Use Type 4 for Agent Memory (Not Document RAG)
If your primary need is agent memory, not document retrieval, skip Types 1-3 entirely and implement Graphiti or Mem0. These systems are designed for real-time knowledge accumulation from agent interactions, not batch document processing. They solve a fundamentally different problem: helping agents remember and reason over their own history.
Frequently Asked Questions
What is Graph RAG and how does it differ from vector RAG?
Graph RAG adds a knowledge graph layer to retrieval-augmented generation. Instead of finding similar text chunks by vector similarity, it traverses relationships between entities (people, companies, concepts) to retrieve contextually connected information. This enables multi-hop reasoning and global summarization queries that vector RAG cannot handle, but costs 10-40x more to index.
How much does Graph RAG cost compared to vector RAG?
For a 500-page corpus, Microsoft GraphRAG indexing costs $50-200 and takes about 45 minutes. LightRAG indexes the same corpus for roughly $0.50 in 3 minutes. Standard vector RAG embedding costs under $5. The cost difference comes from LLM-powered entity extraction, which consumes about 58% of total GraphRAG indexing tokens.
When should I use Graph RAG instead of vector RAG?
Use Graph RAG when your queries require connecting information across multiple documents (multi-hop reasoning), when you need global summarization across large corpora, or when your domain has complex entity relationships (legal, healthcare, supply chains). For simple factual lookups over well-structured content, vector RAG is faster, cheaper, and sufficient.
What is the difference between Microsoft GraphRAG and LightRAG?
Microsoft GraphRAG uses full entity extraction, community detection, and hierarchical summarization for best-in-class global query performance. LightRAG uses simpler extraction with a flat graph structure, delivering 70-90% of GraphRAG’s quality at roughly 1/100th the indexing cost. GraphRAG excels at broad analytical queries; LightRAG is better for domain-specific Q&A where cost matters.
Can I combine Graph RAG with vector RAG?
Yes, and most production deployments do exactly this. The hybrid approach routes simple factual queries to vector RAG and complex multi-hop or analytical queries to Graph RAG. Neo4j’s Graphiti framework natively supports hybrid graph+vector retrieval. Starting with graph-enhanced vector search (Type 1) and gradually adding graph-guided retrieval (Type 2) is the recommended adoption path.
