AI coding agents break on large codebases because they treat source code like a bag of text chunks. They embed files, retrieve the most similar ones, and pray that cosine similarity picks the right function out of 4,000 candidates. It usually doesn’t. GitNexus, an open-source project that hit over 10,000 GitHub stars in early 2026, takes the opposite approach: it indexes your entire repository into a property graph, precomputing every dependency, call chain, cluster, and execution flow at index time. When your AI agent asks “what breaks if I change this function?”, the answer comes back in one call, not ten retrieval rounds.
GitNexus runs in two modes. As a CLI, it builds a KuzuDB graph database stored locally in a .gitnexus/ directory, then serves it through an MCP server that connects directly to Claude Code, Cursor, and Windsurf. As a web app, it runs the entire stack in WebAssembly: Tree-sitter WASM for parsing, KuzuDB WASM for the database, and transformers.js for embeddings. No server, no data leaving your machine.
Why AI Coding Agents Lose Context
The core problem is well understood by anyone who has used Cursor, Claude Code, or GitHub Copilot on a codebase with more than a few hundred files. These tools use some form of retrieval to find relevant code before generating a response. The retrieval is usually semantic search: embed the user’s question, embed code chunks, find the nearest neighbors.
This works for simple queries like “where is the login function?” It falls apart for structural questions: “What services depend on this database schema?” or “If I refactor this interface, which downstream consumers break?” Answering these questions requires understanding the graph structure of the code, not the textual similarity of individual files.
A 2025 analysis by Zep AI documented three failure modes of vanilla RAG: it cannot trace multi-hop relationships, it misses temporal changes in data, and it retrieves context that is semantically close but functionally irrelevant. All three problems apply directly to code retrieval. A function’s signature may look similar to five others, but only one is actually called by the code you are modifying.
How GitNexus Builds Its Knowledge Graph
GitNexus solves the retrieval problem by not relying on retrieval at all in the traditional sense. Instead, it precomputes the structural relationships at index time through a multi-phase pipeline.
Phase 1: Parsing with Tree-sitter
The first step is parsing every file using Tree-sitter grammars. GitNexus supports 12 languages out of the box: TypeScript, JavaScript, Python, Java, Go, Rust, C, C++, C#, Ruby, PHP, and Swift. Tree-sitter produces concrete syntax trees that GitNexus walks to extract symbols (functions, classes, interfaces, types), imports/exports, call sites, and inheritance relationships.
Unlike regex-based indexers that miss edge cases, Tree-sitter handles nested generics, decorator patterns, and complex type annotations correctly. The parser runs in WebAssembly when used in the browser, making it portable without sacrificing accuracy.
Phase 2: Graph Construction in KuzuDB
The extracted symbols and relationships feed into a KuzuDB property graph. KuzuDB is an embedded graph database (think SQLite but for graphs) that supports Cypher queries, vector HNSW indexes for semantic search, and full-text search. Each node in the graph represents a code entity: a function, class, module, or file. Edges represent relationships: “calls,” “imports,” “extends,” “implements.”
The graph also stores metadata: file paths, line numbers, complexity scores, and semantic embeddings generated by transformers.js. This hybrid structure means you can query by both graph traversal (“show me everything two hops from this function”) and semantic similarity (“find functions related to authentication”).
Phase 3: Clustering and Flow Analysis
After building the raw graph, GitNexus runs clustering algorithms to identify cohesive modules, detects execution flows (entry point to database call, for example), and scores edge confidence. The result is a graph where every query returns not just raw connections but an assessment of how strongly coupled two components are.
This precomputation is the key architectural decision. Traditional RAG systems do their reasoning at query time, which means the LLM has to discover relationships itself across multiple retrieval rounds. GitNexus front-loads that work, so the agent gets decision-ready context immediately.
The MCP Server: Connecting to Your Coding Agent
The CLI mode is where GitNexus becomes most practical for day-to-day development. After indexing a repository with gitnexus index, the tool starts an MCP (Model Context Protocol) server that any compatible coding agent can connect to.
GitNexus exposes seven specialized MCP tools:
Hybrid Search: Combines BM25 keyword matching, semantic vector search, and Reciprocal Rank Fusion (RRF) to find relevant code. More accurate than pure semantic search because it weights exact token matches alongside meaning.
Symbol Context Lookup: Given a function or class name, returns its full definition, callers, callees, and the cluster it belongs to. This is what makes “show me everything that calls this function” a single-tool-call operation.
Blast Radius Analysis: The standout feature. Given a symbol, it computes every downstream consumer, scores the risk of changing it, and returns a prioritized list of files that would need updating. Before GitNexus, this was a manual grep-and-trace exercise.
Git-Diff Impact Mapping: Takes a diff (staged changes, a branch comparison, a specific commit) and maps it against the knowledge graph to show which parts of the codebase are affected by those changes.
Multi-File Refactoring: Coordinates changes across multiple files based on the graph structure. If you rename an interface, it knows which implementations, tests, and consumers need updates.
Grep: Pattern-based code search across the indexed repository.
Read/Highlight: Reads specific files or highlights relevant sections within a file.
A global registry means one MCP server can serve multiple indexed repositories without per-project configuration. You set it up once and it works everywhere.
Browser Mode: Zero-Server Code Intelligence
Not every use case requires a CLI. GitNexus also runs entirely in the browser using WebAssembly. You paste a GitHub URL or upload a ZIP file, and the entire indexing pipeline runs client-side: Tree-sitter WASM parses the code, KuzuDB WASM stores the graph, and in-browser embeddings (via transformers.js) enable semantic search.
The browser UI visualizes the knowledge graph using Sigma.js and Graphology, rendering interactive node-link diagrams where you can click through dependencies and call chains. A built-in Graph RAG Agent powered by a LangChain ReAct loop lets you ask natural language questions about the codebase. The agent has five tools at its disposal: search, Cypher queries, grep, file reading, and code highlighting.
The privacy story is straightforward: nothing leaves your browser. The entire stack runs locally, which matters for proprietary codebases where uploading code to a third-party service is not an option.
Where GitNexus Fits in the Code Intelligence Stack
GitNexus occupies a specific niche. It is not a replacement for your IDE’s language server (which handles real-time type checking, auto-completion, and inline diagnostics). It is not a replacement for your coding agent (Cursor, Claude Code, or Windsurf still handle the actual code generation). It is the structural context layer that sits between your codebase and your agent, ensuring the agent understands the graph structure of your code rather than treating it as a pile of text files.
The closest comparison is to tools like Sourcegraph’s code intelligence features or JetBrains’ structural search. But those are closed-source, server-dependent, and not designed to feed context into LLM-based agents. GitNexus is open-source (MIT license), runs locally, and outputs context in the format that MCP-compatible agents expect.
For teams working on large codebases (50,000+ lines), the difference between an agent that understands your dependency graph and one that uses vanilla retrieval is the difference between a useful coding partner and an expensive autocomplete. GitNexus closes that gap by making structural code intelligence available as a standard MCP tool.
Getting Started
Installation is straightforward:
npm install -g gitnexus
Index a repository:
cd your-project
gitnexus index
Start the MCP server:
gitnexus serve
Then add the MCP server to your coding agent’s configuration. For Claude Code, add it to your MCP settings. For Cursor, configure it in the MCP panel. The global registry means subsequent repositories just need gitnexus index without reconfiguring the server.
For the browser version, visit gitnexus.vercel.app, paste a GitHub URL, and start exploring.
Frequently Asked Questions
What is GitNexus and how does it help AI coding agents?
GitNexus is an open-source tool that indexes codebases into knowledge graphs, mapping every dependency, call chain, and execution flow. It serves this structural context to AI coding agents like Claude Code, Cursor, and Windsurf through an MCP server, giving them accurate codebase understanding instead of relying on basic text retrieval.
How does GitNexus differ from traditional RAG for code?
Traditional RAG embeds code as text chunks and retrieves by semantic similarity, which misses structural relationships like call chains and dependencies. GitNexus precomputes these relationships into a property graph at index time, so queries about code structure return accurate results in a single call instead of requiring multiple retrieval rounds.
What programming languages does GitNexus support?
GitNexus supports 12 programming languages: TypeScript, JavaScript, Python, Java, Go, Rust, C, C++, C#, Ruby, PHP, and Swift. It uses Tree-sitter grammars for parsing, which ensures accurate extraction of symbols, call sites, and type relationships.
Does GitNexus send my code to external servers?
No. GitNexus runs entirely locally. The CLI stores the knowledge graph in a .gitnexus/ directory on your machine, and the browser version runs everything in WebAssembly, meaning no code leaves your browser. This makes it suitable for proprietary codebases.
What is blast radius analysis in GitNexus?
Blast radius analysis is a GitNexus feature that, given a code symbol like a function or class, computes every downstream consumer, scores the risk of changing it, and returns a prioritized list of files that would need updating. This turns manual grep-and-trace work into a single-command operation.
