mcp-reposkein
OfficialProvides an optional Neo4j backend for storing and querying the code property graph, enabling graph database capabilities.
๐ญ Live demo โ โ RepoSkein's own graph, rendered as an interactive 3D constellation in your browser.
Introduction
RepoSkein gives your AI coding agent a map of your codebase โ so it navigates structure instead of grepping and guessing.
It uses Tree-sitter to build a deterministic Code Property Graph of your repo โ files, classes, functions, imports, and call edges โ and serves it to any MCP-capable agent (Claude Code, Cursor, Codex, โฆ). As the agent works, it writes short natural-language summaries onto graph nodes; those summaries are versioned in git alongside the code, so an agent's understanding becomes shared team memory that the next agent โ or teammate โ starts from.
Who it's for: developers using AI coding agents on real, large, or nested/polyglot codebases, who are tired of the agent burning its context window on grep; and teams who want that hard-won understanding to persist and be shared rather than re-derived every session.
โก Zero-infra โ no database, no Docker. The graph lives in committed
.reposkein/*.jsonlfiles.๐ Deterministic โ same code โ byte-identical graph. No LLM in the construction path.
๐ 7 languages โ Python, TypeScript, JavaScript, Rust, Go, Java, C#.
๐งฉ Local-first & git-native โ the graph and its summaries travel with your code.
Your agent asks | RepoSkein answers โ directly from the graph |
"Who calls | the exact callers, with one-line summaries |
"What breaks if I change this?" | the impacted callers + the tests that cover them |
"Where do I even start?" | ranked entry-point functions by meaning, not filename |
"What usually changes with this file?" | co-change history from git |
In a deterministic, no-LLM benchmark, RepoSkein surfaces the right functions with a mean ~8.4ร fewer context tokens than a grep-based agent on structural queries.
Related MCP server: Octocode
Table of contents
Prerequisites
Node.js 18+ โ to run
npx @reposkein/mcp(the indexer binary is fetched automatically).An MCP-capable agent โ Claude Code, Cursor, Codex, Zed, etc.
A git repository to index (RepoSkein installs git hooks and commits the graph).
Optional: Docker (only for the embeddings server or the Neo4j backend); Rust (only to build from source).
Installation
In the repo you want your agent to understand:
npx @reposkein/mcp initThis downloads the indexer for your platform, installs git hooks + the navigation skill, builds the initial code graph, and prints an MCP config block. Then:
Add the printed config to your agent (e.g. Claude Code's
.mcp.json):{ "mcpServers": { "reposkein": { "command": "reposkein-mcp", "env": { "REPOSKEIN_REPO_PATH": "/path/to/your/repo" } } } }Verify and commit the graph (
initalready built it):reposkein-mcp doctor . # โ binary โ indexed (N nodes) โ ready git add .reposkein && git commit -m "add RepoSkein code graph"Re-index after big changes with
reposkein-mcp index .(or the agent'sreindex_filetool).Ask your agent "what calls this function?" or "what breaks if I change X?" โ it answers from the graph.
Prefer to let your agent set it up? Install the skills and tell it to run the
reposkein-setupskill โ it installs, indexes, and verifies everything:npx skills add reposkein/reposkein --all
Platforms: prebuilt binaries for macOS (Apple Silicon), Linux (x64/arm64), and Windows (x64). Elsewhere, point REPOSKEIN_INDEXER_BIN at a from-source build.
Usage โ working with your agent
You ask in plain language; the bundled reposkein-graph-rag skill drives the tools. The natural loop:
Find where to start โ
semantic_find("jwt auth validation")ranks the right functions by meaning, no symbol name needed. โ "where's the rate limiter?"Understand it โ
get_context_profilereturns the node's callers + callees as ready-to-read prose (hops: 2widens,federated: truespans nested repos).Before you change it โ
impactlists transitive callers (what could break) split from the tests that cover it (what to run). โ "what breaks if I changecharge()?"What moves with it โ
get_temporal_contextsurfaces files that historically change together, plus churn and ownership. โ "what usually changes with the auth config?"Record what you learned โ
write_semantic_summaryattaches a 1โ3 sentence note to the node, committed to git for the next agent/teammate.After editing โ
reindex_filerefreshes the graph for the changed file.
You: Refactor how we validate auth tokens โ what's involved?
Agent: โ semantic_find("validate auth token") โ resolves auth.validate_jwt
โ get_context_profile(auth.validate_jwt) โ 3 callers, 2 callees (+ summaries)
โ impact(auth.validate_jwt) โ 6 impacted functions, 4 covering tests
โ get_temporal_context("src/auth/jwt.py") โ co-changes with config/keys.py (conf 0.8)
Agent: "validate_jwt is called by the login, refresh, and middleware paths;
changing its signature touches 6 functions and 4 tests (test_jwt.py,
test_middleware.py, โฆ). Heads up: jwt.py historically changes together
with config/keys.py โ you'll likely need to update both."๐ฅ A short screen recording is on the roadmap โ see Documentation.
Agent skills
RepoSkein ships two cross-agent Agent Skills โ npx skills add reposkein/reposkein --all installs both into Claude Code, Cursor, Codex, and 70+ agents:
reposkein-setupโ installs RepoSkein in a repo and verifies it's running (binary โ index โ MCP reachability). Ask your agent to run it.reposkein-graph-ragโ teaches your agent when to use each tool (the loop above).reposkein-mcp initinstalls it automatically for Claude Code.
Supported languages
Language | Definitions | Imports โ edges | Cross-file calls |
Python | functions, classes, methods, nested defs, vars | โ relative / absolute / aliased | import-resolved ( |
TypeScript / TSX | classes, interfaces, enums, methods, arrows | โ
named / default / aliased / | import-resolved ( |
JavaScript / JSX | (via the TS grammar) | โ ES imports (no CommonJS yet) | import-resolved ( |
Rust | fns, structs, traits, enums, | โ
| import-resolved ( |
Go | funcs, methods ( | not yet (cross-package planned) | same-package (same-dir); cross-package by name |
Java | classes, records, interfaces, enums, methods, constructors, fields | โ package-path (no wildcard/static yet) | import-resolved ( |
C# | classes, structs, records, interfaces, enums, methods, properties | not yet (cross-namespace planned) | same-dir; cross-namespace by name |
What resolves โ honestly. Every edge carries a resolution (exact / name_match / ambiguous) + confidence, so your agent knows what to trust. Same-file calls, self/this methods, and import-followed free-function calls resolve exact. Python module-alias calls (import foo as f; f.bar()) resolve exact to the target module's function. Cross-file INHERITS/IMPLEMENTS edges are resolved repo-wide: import-followed bases resolve exact (confidence 1.0); unique same-directory or repo-wide bases resolve name_match (0.8/0.7); ambiguous bases are skipped to avoid false hierarchy edges โ and bases that live in a federated child repo are stitched into cross-repo heritage edges at load time. Go's struct/interface embedding (type Dog struct { Animal }) is captured as INHERITS. Constructors emit a distinct INSTANTIATES edge (new Foo() in TS/Java/C#, Foo { .. } and Foo::new() in Rust, Foo{} / &Foo{} composite literals in Go, and Python Foo() whose name resolves to a class) so an agent can ask who creates instances of this type โ resolved against the type index and skipped when ambiguous. The graph is type-free by design (deterministic, no compiler in the loop), but it does track types where it can do so soundly from source alone: when a local is assigned a constructor (x = Foo(); x.bar()), that x.bar() resolves exact to Foo.bar (intraprocedural receiver typing). Method calls on receivers it can't trace that way (parameters, fields, return values) resolve by name (โค name_match), and overloaded calls are flagged ambiguous. Go and C# don't emit import edges yet, so their cross-package/namespace calls resolve by name (same-package/-directory calls do resolve). These limits are inherent to the zero-infra, type-free design; a deeper optional type-aware layer (SCIP) is gated on benchmark evidence. Adding a language is a well-trodden path โ contributions welcome.
How it works
Your agent (Claude Code / Cursor / โฆ) โโ guided by the reposkein skill
โ MCP
โผ
@reposkein/mcp semantic_find ยท get_context_profile ยท impact ยท get_temporal_context
(TypeScript) read_cypher ยท write_semantic_summary ยท init_cpg_skeleton ยท reindex_file
CLI: init ยท doctor ยท index ยท view
โ reads
โผ
.reposkein/*.jsonl โ the code graph, committed to git (zero-infra, in-memory store)
โฒ writes
โ
reposkein-indexer Tree-sitter parse โ stable IDs โ canonical JSONL
(Rust) + git hooks & a 3-way merge driver for conflict-free summariesStructure is static. The skeleton comes only from parsing โ identical code produces a byte-identical graph (a CI-tested invariant), independent of who runs it.
Meaning is just-in-time. Summaries are written as the agent visits nodes; they're content-hash-stamped (so they flag stale when code changes) and committed to git.
Local-first. The committed JSONL is the source of truth; the optional Neo4j backend is a reconstructable projection most users never need.
Cross-repo federation
Got nested repositories (a monorepo of indexed repos)? RepoSkein discovers them, links them with FEDERATES_TO, and stitches cross-repo call, import, and heritage edges (INHERITS/IMPLEMENTS to a base in a child repo) at load time. Pass federated: true to traverse across repo boundaries. Federation edges are derived at load (never committed), so each repo stays independently deterministic.
Visualize the graph โ the constellation viewer
reposkein-mcp view . # opens http://127.0.0.1:<port> in your browserview starts a local, read-only, zero-infra web app (React + three.js, bound to 127.0.0.1) that renders the committed .reposkein graph as an interactive 3D astronomy-style constellation. There's no Neo4j and no external service โ it reads the committed JSONL directly and never mutates it. Try the live demo โ (RepoSkein viewing its own multi-language graph).
The map is deterministic: a seeded force layout means the same graph always lays out the same way (cached in IndexedDB for instant reloads), and the layout is render-time only โ it never touches the committed JSONL. Levels of detail map onto an astronomy metaphor โ Repository โ Directory โ File โ Symbol become galaxy โ constellation โ solar-system โ star โ so you zoom or click to expand a cluster (a brief supernova animation) and click a star to inspect it. Federation galaxies and agent-written summaries render when present.
Legible โ per-edge-type colors + legend, importance-sized stars, adaptive labels, breadcrumb, per-language galaxy coloring, depth fog / bloom / nebula halos.
Edges encode resolution โ color = edge type (
CALLS/IMPORTS/INHERITS/IMPLEMENTS/INSTANTIATES), opacity = confidence (exact/name_match/ambiguous), and flow particles show call direction.Analytical โ one-click lenses (call graph / type hierarchy / imports / tests), an impact overlay (transitive callers + covering tests), a confidence-audit mode (see where the type-free resolver guesses), and a temporal-coupling overlay (git co-change).
Explorable โ ranked search-to-fly, N-hop neighborhood focus, source peek in the detail panel (a path-guarded read-only file slice + an "Open in editor"
vscode://link), keyboard nav (/search,fframe-all, arrows to hop neighbors,Escback), a minimap, and PNG screenshot export.Guided tour โ a cinematic, deterministically-derived flythrough (overview โ largest modules โ busiest hub โ type hierarchy โ entry point) with captions.
reposkein-mcp view --export ./site . # write a self-contained static site--export bakes the graph into graph-data.js (as window.__REPOSKEIN_GRAPH__) and emits a self-contained static site โ it works from file:// or any static host with no server, which is exactly how the live demo above is published. Handy for sharing a snapshot, embedding in docs, or a project landing page.
MCP tools
Tool | What it does |
| find where to start โ rank functions/classes by meaning (lexical BM25F; optional embeddings) |
| resolve a function/class โ its caller/callee neighborhood as ready-to-read prose |
| transitive callers split into impacted code vs covering tests |
| git-derived co-change, churn, and ownership for a file |
| read-only graph queries (writes rejected, results capped) |
| attach a hash-stamped summary to a node |
| build/rebuild the graph |
| refresh after editing a file |
The reposkein-mcp CLI adds init (set up a repo), doctor (health check), index (rebuild the graph), and view (the constellation viewer; --export <dir> writes a self-contained static site).
Optional: semantic embeddings
By default semantic_find is deterministic and lexical (BM25F โ zero-infra, no keys). You can opt into a hybrid tier (lexical + embedding cosine, fused via RRF) for fuzzier queries. It's default-off, vectors are cached in .reposkein/local/embeddings/ (gitignored, never committed), and it falls back to lexical automatically on any error. Set env vars on the MCP server and pick one:
A) Voyage AI โ cloud, easiest, best for code
Get a key, then:
REPOSKEIN_EMBED_PROVIDER=voyage
VOYAGE_API_KEY=pa-...
# optional: REPOSKEIN_EMBED_MODEL=voyage-code-3 # default โ code-specializedSends document strings (qualified names, signatures, summaries) to Voyage's API. Use B or C if you can't egress code.
B) Ollama โ local, off-the-shelf, no key
ollama pull nomic-embed-text # 768-dim (or mxbai-embed-large=1024, bge-m3=1024)REPOSKEIN_EMBED_PROVIDER=http
REPOSKEIN_EMBED_URL=http://127.0.0.1:11434/v1/embeddings
REPOSKEIN_EMBED_MODEL=nomic-embed-text
REPOSKEIN_EMBED_DIMS=768 # must match the modelC) Voyage's open model, self-hosted โ offline + Voyage quality
voyage-4-nano (Apache-2.0) is a custom Qwen3-based model Ollama can't run, so RepoSkein ships a prebuilt server. The image is published to GHCR โ public and multi-arch (amd64/arm64) โ so there's nothing to build:
docker run -p 8080:8080 -v reposkein-hf:/root/.cache/huggingface \
ghcr.io/reposkein/reposkein-embed # auto-picks your architecture; first run downloads the modelREPOSKEIN_EMBED_PROVIDER=http
REPOSKEIN_EMBED_URL=http://127.0.0.1:8080/v1/embeddings
REPOSKEIN_EMBED_MODEL=voyage-4-nano
REPOSKEIN_EMBED_DIMS=1024 # must equal the server's EMBED_DIMSEverything stays on your machine. The image is CPU-only and runs with no NVIDIA GPU on Apple Silicon / ARM unified-memory, x64 Linux, and Windows (CI builds + smoke-tests both arches). Docker can't use Apple's Metal/MPS โ for that, run the server natively with EMBED_DEVICE=mps. Full details (root docker compose up, GPU, other models): embed-server/README.md.
REPOSKEIN_EMBED_DIMSon the client must match the model's output dimension, or cosine scoring is skipped.
Optional: Neo4j backend
The zero-infra JSONL store is the default. Neo4j is an optional projection for very large graphs and raw Cypher at scale:
docker compose --profile neo4j up -d # from the repo root
NEO4J_PASSWORD=reposkeintest reposkein-indexer load .Then set REPOSKEIN_STORE=neo4j + the NEO4J_* env vars on the MCP server. (REPOSKEIN_STORE=auto, the default, uses JSONL when present and falls back to Neo4j only if configured.)
Benchmarks
Two tracks, both under mcp/bench/:
Track 1 โ retrieval efficiency (deterministic, no LLM): RepoSkein vs a grep agent on hand-labeled tasks โ mean ~8.4ร fewer context tokens on structural queries, at F0.5 = 1.00 vs grep 0.11โ0.71. Details.
Track 2 โ end-task (SWE-bench-Verified): a minimal agent loop where the only difference is the navigation toolset (RepoSkein vs grep), graded on resolve-rate + tokens + turns. Built + unit-tested; the API+Docker run is opt-in.
Build from source
Requirements: Rust (stable), Node 24. Docker only for the optional Neo4j backend.
cd indexer && cargo build --release # โ indexer/target/release/reposkein-indexer
cd ../mcp && npm install && npm run buildWire it into your agent with command: node, args: [".../mcp/dist/index.js"], env REPOSKEIN_REPO_PATH + REPOSKEIN_INDEXER_BIN. Tests: cd indexer && cargo test && cargo clippy --all-targets -- -D warnings; cd mcp && npm test.
Repository layout
indexer/ Rust workspace: core, lang-{python,ts,rust,go,java,csharp}, lang-common, neo4j-io, cli
mcp/ @reposkein/mcp โ the TypeScript MCP server (tools + graph-store backends)
mcp/bench/ benchmarks: retrieval efficiency (Track 1) + end-task SWE-bench harness (Track 2)
skills/ reposkein-graph-rag + reposkein-setup โ cross-agent skills (skills.sh)
embed-server/ one-command local embedding server (voyage-4-nano) for hybrid semantic_find
viz/ @reposkein/viz โ the 3D constellation viewer SPA (served by `reposkein-mcp view`)Documentation
Doc | What's in it |
the | |
the | |
the local embedding server โ Docker/GHCR, platforms, GPU | |
Track 1 retrieval benchmark โ method + results | |
Track 2 end-task (SWE-bench) harness | |
release history (Keep a Changelog) | |
the two cross-agent skills |
Contributing
Contributions are welcome โ bug fixes, new languages, docs. See CONTRIBUTING.md for the dev setup, the determinism invariants you must preserve, and the step-by-step recipe for adding a new language (it's a well-trodden path โ Go, Java, and C# were each added the same way). RepoSkein uses Conventional Commits and keeps CI green (determinism gates + clippy + tests).
Acknowledgements
Tree-sitter โ the parsers behind every language extractor.
Model Context Protocol โ the agent integration standard.
Voyage AI โ
voyage-code-3and the open-weightvoyage-4-nanopowering the optional embeddings tier.Discovery via Glama, skills.sh, mcpservers.org, and the awesome-mcp community lists.
README header by capsule-render + readme-typing-svg.
Contact
๐ Bugs / features: open an issue
๐ฌ Questions / ideas: GitHub Discussions
License
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/reposkein/reposkein'
If you have feedback or need assistance with the MCP directory API, please join our Discord server