graphify-mcp
graphify-mcp exposes a codebase knowledge graph as MCP tools, prompts, and resources, enabling AI assistants to explore, query, and maintain structural code insights with token-efficient queries.
Build & Maintain
graphify_build— Build or incrementally update a knowledge graph from source code (Python viaast; JS/TS/Go/Java/Rust/C++ via tree-sitter)graphify_add— Incorporate external URLs (arXiv papers, tweets) into the graphgraphify_freshness— Detect if the graph is stale vs. git HEAD; distinguishes cosmetic (comment/format) changes from structural ones and recommends fresh/update/rebuildgraphify_validate— Lint for dangling/duplicate edges and orphan nodes
Graph Exploration & Navigation
graphify_overview— One-shot orientation: graph size, god nodes, communities, surprise edges, suggested next stepsgraphify_query— Natural-language queries with optional DFS tracing and token budget cappinggraphify_path— Find the exact path between two named nodesgraphify_explain/graphify_node_details— Full metadata for a node (type, file, line, docstring, community)graphify_subgraph— BFS subgraph around a node capped at a token budget (core cheap exploration tool)graphify_neighbors— List 1-hop neighbors with relation typesgraphify_search— Search nodes by name/label text
Structural Analysis
graphify_god_nodes— List highest-degree (most connected) nodesgraphify_communities— Summarize Leiden communities with sizes and sample membersgraphify_surprises— Surface unexpected cross-file/cross-domain connections
Semantic Naming
graphify_label_communities— Assign human-readable names to communities via host-LLM sampling, backend API key, or placeholdersgraphify_set_labels— Persist assistant-provided community names into the graphgraphify_sampling_status— Capability test for which naming method is available
Semantic Bridge (optional [semble] extra)
graphify_locate— Joins semantic search and graph structure, returning token-budgeted subgraphs plushidden_links(semantically similar but structurally disconnected code)
LLM-Friendly Features
Tool annotations (
readOnlyHint,destructiveHint),as_jsonstructured output, token budgetingReusable prompts (
onboard,trace_bug,explain_flow) that orchestrate tools for common workflowsResources exposing graph report, raw graph JSON, and per-community wikis
Full or lean toolset mode via
GRAPHIFY_TOOLSETenv varstdio(default) or HTTP transport with authentication; deployable locally or as a shared team server
Allows adding a source from arXiv by URL, enabling the knowledge graph to include research papers as nodes.
Optionally used as a local backend for labeling communities with semantic names via the graphify_label_communities tool.
Optionally used as a backend API for labeling communities with semantic names via the graphify_label_communities tool.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@graphify-mcpgive me an overview of the codebase graph"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
graphify-mcp
A Python MCP server that exposes the Graphify knowledge graph as MCP tools, prompts and resources — so an AI assistant can explore your codebase through the graph during development, cheaply (token-budgeted) and structurally.
Note: Graphify ships its own embedded MCP server (
graphify ./raw --mcp). This project adds analysis tools, token-budgeted subgraph extraction, git freshness checks, per-community resources, reusable prompts, and LLM-friendly tool annotations + structured (JSON) output on top.
Why graphify_locate
One MCP call turns a natural-language question into a navigational map, not a wall of code:
🔎 Semantic + structural, one call — semble finds the relevant code, the graph gives its neighborhood. ~235 tokens to orient vs ~61k for grep+read (263× fewer on httpx).
🔗
hidden_links— semantically similar code that is structurally disconnected (duplication / missing-abstraction / sync-async-twin candidates) that neither search nor the graph surfaces alone.🌍 Multi-language, zero config — Python via stdlib
ast; JS/TS · Go · Java · Rust · C++ · 165+ more via tree-sitter with automatic language detection. Span-join precision 69–91% on real HTTP-client repos in six languages (benchmark).🕒 Cosmetic-aware freshness —
graphify_freshnessignores comment/format-only edits (in every language) so a reformat never triggers a needless rebuild.
One call beats running semble and graphify separately
semble finds what's relevant; graphify gives how it connects. They're complementary — but stitching them by hand means four calls, ~2.7k tokens, and manually aligning semble's line ranges to graph nodes. graphify-mcp does that join for you, in one call:
per query | semble alone | graphify alone | both, by hand |
|
Semantic search | ✓ | — | ✓ | ✓ |
Graph structure | — | ✓ | ✓ | ✓ |
Chunk → symbol join | — | — | you wire it | ✓ automatic |
| — | — | — | ✓ only here |
Calls | 1 | 1 | 4 | 1 |
Tokens to orient | 1,613 | 1,107 | 2,721 | 235 |
→ 11.6× fewer tokens than running the two separately — in a single call, and hidden_links (semantically similar code that is structurally disconnected) is a signal neither tool produces alone. So the combined tool isn't just convenience: it's cheaper, and it surfaces something the parts can't. (full benchmark ↓)
Installation
# graphify-mcp itself
pip install graphify-mcp
# plus the Graphify CLI it wraps (needed for build/query/path/explain/add)
pip install graphifyy && graphify installFrom source:
git clone https://github.com/yasinyaman/graphify-mcp
cd graphify-mcp
pip install -e ".[dev]"Related MCP server: CodeGraphMCPServer
Running
GRAPHIFY_PROJECT_DIR=/path/to/repo graphify-mcp-server
# equivalently, collision-proof:
GRAPHIFY_PROJECT_DIR=/path/to/repo python -m graphify_mcpHeads-up:
graphifyyships its owngraphify-mcpconsole script (its embedded server). To avoid a silent collision, this package deliberately doesn't define a baregraphify-mcpof its own — usegraphify-mcp-serverorpython -m graphify_mcpto always launch this server. The boot banner on stderr (graphify-mcp vX.Y.Z | transport=… | project=…) confirms which server and project dir you're actually running.
Claude Code
Copy mcp.json to a .mcp.json at your project root. GRAPHIFY_PROJECT_DIR: "." uses the project root.
Claude Desktop / Cowork
Add the contents of claude_desktop_config.json to your Claude Desktop config:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:
%APPDATA%\Claude\claude_desktop_config.json
Transport (stdio default, optional HTTP)
stdio is the default and the right choice for a per-developer local server. To serve over HTTP instead (e.g. a shared graph for a team or a web MCP client):
GRAPHIFY_TRANSPORT=streamable-http GRAPHIFY_HOST=127.0.0.1 GRAPHIFY_PORT=8000 \
GRAPHIFY_PROJECT_DIR=/path/to/repo graphify-mcp-serverAny HTTP transport force-enables path containment (GRAPHIFY_RESTRICT_PATHS)
so a network client can't drive graphify_build to extract arbitrary filesystem
paths. HTTP binds 127.0.0.1 by default. To expose it beyond localhost, set
GRAPHIFY_API_KEY — every request must then send Authorization: Bearer <key>
(constant-time checked, 401 otherwise); binding a non-loopback host without a key
prints a warning.
The CLI is always invoked as an argument list with no shell (subprocess.run
with shell=False), so a build path or query string can't inject shell commands.
For a shared/network deployment, also consider lowering GRAPHIFY_TIMEOUT (default
600s) so a single slow graphify_build can't tie up a worker for ten minutes.
GRAPHIFY_TRANSPORT=streamable-http GRAPHIFY_HOST=0.0.0.0 GRAPHIFY_API_KEY=$(openssl rand -hex 16) \
GRAPHIFY_PROJECT_DIR=/path/to/repo graphify-mcp-serverFor a smaller tool surface (helps some models pick the right tool), set
GRAPHIFY_TOOLSET=lean to expose only the core exploration tools.
Environment variables
Variable | Default | Description |
|
| Project root to extract the graph from |
|
| Output folder name |
|
| CLI path |
|
| CLI timeout (seconds) |
|
| Confine |
|
|
|
|
| Bind host for HTTP transports |
|
| Bind port for HTTP transports |
| (unset) | Require |
|
|
|
| (heuristic) |
|
Keeping the graph fresh
The analysis tools surface staleness for you: graphify_overview and
graphify_subgraph carry a lightweight graph_age ("built 3 commits ago"), and
graphify_freshness gives a full recommended_action (fresh / update / rebuild).
To stop thinking about it, regenerate on every commit with a git post-commit
hook — the recommended first-class auto-update flow:
# .git/hooks/post-commit (then: chmod +x .git/hooks/post-commit)
#!/bin/sh
# incremental, viz-free, backgrounded so the commit returns immediately
graphify . --update --no-viz >/dev/null 2>&1 &Incremental --update only re-extracts changed files. It can't drop nodes for
deleted/renamed code, so after those graphify_freshness still recommends a full
graphify . rebuild — run that occasionally (or from a post-merge hook). An
agent can also just call graphify_build(update=True) when graph_age /
graphify_freshness says the graph drifted.
Tools
CLI-backed (the first two write state; the rest are read-only):
Tool | Purpose |
| Build/update the graph ( |
| Add a source by URL (arXiv, tweet) |
| Natural-language query ( |
| Exact path between two nodes |
| Everything about a node |
graph.json analysis (read-only, no CLI needed, as_json=True for structured output):
Tool | Purpose |
| Call first — size, god nodes, communities, surprises, suggested next steps |
| Most connected nodes |
| Leiden community summaries |
| Unexpected cross-domain connections |
| Node search |
| 1-hop neighbors of a node |
| Token-budgeted BFS subgraph around a node — the cheap way to feed the model just the relevant slice |
| Node metadata: type, source file/line, docstring, community |
| Is the graph stale vs. git HEAD? Returns |
| Lint the graph for dangling/duplicate/self-loop edges and orphan nodes (read-only) |
Semantic naming (uses the host model via MCP sampling — no API key — or a backend key):
Tool | Purpose |
| Capability test: reports whether the client supports host-LLM sampling, whether a backend key is set, and which method will be used |
| Give Leiden communities human-readable names. |
| Persist assistant-provided community names (sampling-free fallback) to |
Semantic bridge (optional [semble] extra — semantic search joined to graph structure):
Tool | Purpose |
| NL query → enclosing graph node → token-budgeted subgraph, plus |
Naming communities without an API key (MCP sampling)
The Leiden clustering is keyless, but turning Community 7 into Authentication
needs a model. Three ways, in graphify_label_communities's preference order:
Host-LLM sampling — the server asks the connected client to run the completion via MCP
sampling/createMessage. The model the user already uses (e.g. Claude in a sampling-capable client) does the naming; the server holds no API key. Subject to client support — callgraphify_sampling_statusfirst; it degrades gracefully when unsupported.Backend API key (
method="cli") — setGEMINI_API_KEY/OPENAI_API_KEY/ANTHROPIC_API_KEY/ … (or run a local ollama) and graphify's own backend names them. This option always remains available.Placeholders — no model anywhere: names stay
Community N.
If the client can't sample and you have no backend (e.g. Claude Code, which
doesn't support sampling), use the assistant-driven fallback: the assistant
is already a capable model in the loop, so it reads graphify_communities and
pushes names back via graphify_set_labels({"0": "Authentication", ...}) —
no key, no sampling, works in any client. The names persist to
.graphify_labels.json and are patched into graph.html.
Semantic bridge (optional [semble])
pip install "graphify-mcp[semble]" adds graphify_locate, which joins
semble's semantic code search to the graph
in one call. Graphify gives structure (how code connects); semble gives
retrieval (which code is semantically relevant) — they're complementary.
graphify_locate("how does retry backoff work"):
semble finds the most relevant code and resolves the top hit to its enclosing graph node (better than label matching).
returns the token-budgeted subgraph around it (structure).
runs semble
find_relatedand cross-checks: a cousin that is semantically similar but not within the seed's structural neighborhood is flagged as ahidden_link(with its hop distance) — a duplication / missing-abstraction / implicit-coupling candidate that neither tool surfaces alone.
The extra is optional: without it the core tools are unchanged and graphify_locate
returns an install hint. It also pairs well with running semble's own MCP server
alongside graphify-mcp.
The chunk→node join and the freshness cosmetic-vs-structural check work
across languages: Python uses the stdlib ast (no extra deps), and every
other language (JS/TS, Go, Rust, Java, Ruby, C/C++, …) is handled by an optional
tree-sitter backend — pip install "graphify-mcp[treesitter]", also pulled in
by graphify. Without it, non-Python files fall back to nearest-line matching.
Benchmark
Averaged over 6 queries spanning httpx subsystems (send path, digest auth, redirects, content decoding, cookies, timeouts) on the 2,101-node graph. Each query orients an agent to a code area; tokens = what reaches the model's context (≈ chars/4).
Approach | Tokens (avg) | Calls | Structure | Semantic | Hidden links |
Naive grep + read | 61,836 | ~14 | — | — | 0 |
semble alone | 1,613 | 1 | — | ✓ | 0 |
graphify alone | 1,107 | 1 | ✓ | — | 0 |
semble + graphify (separately) | 2,721 | 4 | ✓ | ✓ | 0 |
| 235 | 1 | ✓ | ✓ | 7 |
graphify_locate averages 263× fewer tokens than grep+read and 11.6× fewer
than running semble and graphify separately (one call instead of four) — and it's
the only approach that surfaces hidden_links (semantically similar but structurally
disconnected code), 5–10 per query.
Those ~235 tokens are a navigational map (seed file:line + structural
neighborhood + hidden links), not raw code — you fetch the specific code only where
needed. That's the trade graphify-mcp optimizes: cheapest orientation plus the
cross-check signal, then drill in precisely.
Case study — the hidden links are real. Asked "does httpx duplicate
request-sending across sync and async?", graphify_locate returned the seed
Client._send_single_request and flagged hidden links. Checking the source
confirmed every production flag is a genuine sync/async twin:
Client._send_single_request (_client.py:1001) ↔ AsyncClient._send_single_request
(:1717); BaseTransport.handle_request ↔ handle_async_request (in every
transport); __enter__ ↔ __aenter__. ~500 tokens (one locate + a targeted read)
surfaced a real architectural pattern that naively reading _client.py (~16k tokens)
would. The unreachable bucket also held test files (related, not refactor targets) —
the distance field separates production parallels (3–4) from that noise.
Across languages — real HTTP-client repos. The span join and freshness check aren't Python-only. I built AST-only graphs for an HTTP client in five more languages and ran the same kind of queries (send · redirects · timeout/retry · headers/auth · transport):
Language | Repo | Span-join precision | Qualname | Hidden / q | locate vs grep |
Python (ast) |
| 91% (49/54) | 67% | 4.0 | 232× |
JavaScript / TS |
| 89% (48/54) | 67% | 3.2 | 494× |
Go |
| 80% (43/54) | 67% | 4.7 | 748× |
Java |
| 83% (45/54) | 50% | 5.5 | 208× |
Rust |
| 69% (37/54) | 83% | 5.3 | 477× |
C++ |
| 69% (37/54) | 100% | 4.2 | 223× |
Python uses the stdlib ast; JS/TS · Go · Java · Rust · C++ go through tree-sitter with
automatic language detection — one tool, zero per-language config. Span-join precision =
share of semantic hits whose resolved node's real span actually contains the chunk. It's
69–91% across six 347–2,101-node graphs, hidden-links keep surfacing 3–6/query, and locate
stays 200–750× cheaper than grep+read. Rust and C++ trail at 69% — their misses are mostly
impl-block / file-top chunks where the resolution is still correct (they recover qualified
names at 83–100%). graphify_freshness's cosmetic-vs-structural check is correct in every
language too (comment/reformat → cosmetic; operator/rename → structural). Reproduce with
benchmarks/multilang.py.
→ Full benchmark report (interactive HTML, per-query breakdown + the cross-language tables) — or open docs/benchmark.html locally. (Türkçe)
Measured 2026-06 with semble 0.3.4 + graphify (tree-sitter backend). httpx headline = 6
queries (per-query locate 189–286 tokens); cross-language = 6 queries × 54 hits each on
got / resty / retrofit / ureq / cpr. Sample bias: every repo benchmarked here is
an HTTP-client library — a deliberately uniform family chosen for cross-language comparability.
Token savings and span-join precision will differ on other architectures (data pipelines, GUI
apps, sprawling monorepos), so treat these as indicative, not guarantees. Numbers vary by
codebase and query.
Resources
graphify://report— GRAPH_REPORT.mdgraphify://graph— graph.json (raw)graphify://community/{id}— per-community wiki (members + internal/boundary edges)
Prompts
Reusable templates that orchestrate the tools for the assistant:
onboard— orient to the codebase (overview → communities → subgraphs → surprises → summary)trace_bug(symptom)— find likely root-cause locations through the graphexplain_flow(flow)— end-to-end walkthrough of a named flow with file:line refs
LLM-friendliness
Tool annotations (
readOnlyHint,destructiveHint, titles) tell the model which tools are safe to call freely vs. which mutate state.Server instructions describe the recommended flow (overview → targeted subgraph/query → build update).
as_jsonoutput on every analysis tool returns structured data the model can chain on instead of re-parsing prose.Token budgeting (
graphify_subgraph) keeps context small on large graphs — the core of Graphify's ~71× compression.Host-LLM sampling (
graphify_label_communities) lets the server borrow the client's model via MCPsampling/createMessage, so semantic naming works with no server-side API key — with a capability test (graphify_sampling_status) and a backend-key fallback.
Typical workflow
graphify_overview()— orientationgraphify_communities()— subsystemsgraphify_subgraph("SomeNode")— token-cheap targeted explorationgraphify_query("how does the auth flow work?")— questionsAfter code changes:
graphify_freshness()→graphify_build(".", update=True)
Project layout
graphify-mcp/
├── src/graphify_mcp/ # package (server.py, __init__.py)
├── tests/ # pytest suite + fixture graph.json
├── .github/workflows/ # CI (ruff + pytest, py 3.10–3.12)
├── pyproject.toml # packaging + console script
├── mcp.json # Claude Code example config
└── claude_desktop_config.jsonDevelopment
pip install -e ".[dev]"
ruff check .
pytest -qSee CONTRIBUTING.md. Licensed under MIT.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/yasinyaman/graphify-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server