Skip to main content
Glama

PruvaGraph

Codebase knowledge graphs with 95%+ LLM cost reduction.

Turn any repository into a queryable knowledge graph. One command, any language, any size. Built for developers who love Claude Code but not the bill.

Made by PRUVALEX — open source, MIT licensed.


Why PruvaGraph?

Standard code-to-graph tools send every file to an LLM on every run. PruvaGraph doesn't.

Other tools

PruvaGraph

10,000-file repo, daily CI

~3,300,000 LLM calls/month

~3,140 calls/month

Cost (Claude Sonnet)

~$313/month

~$0.30/month

First run

Full LLM scan

Full LLM scan

Re-run (unchanged files)

Full LLM scan again

Instant cache hit

Changed files only

Re-scans everything

Re-scans changed files only

Semantic duplicate detection

None

Groups similar files → 1 LLM call

How: Three layers working together — SHA-256 hash cache + semantic MinHash dedup + smart batch packing. Code files use tree-sitter locally (zero cost). LLM is reserved for docs, PDFs, and images that actually need it.


Related MCP server: CodeXRay

Quick start

# Install
pip install pruvagraph

# Or with uv (faster)
uvx pruvagraph .

# Build graph for current repo
pruvagraph .

# Query it
pruvagraph query "how does authentication connect to the database?"
pruvagraph query "which modules have the most dependencies?"

# Watch mode — auto-update on file changes
pruvagraph watch .

Output in pruvagraph-out/:

  • graph.json — queryable knowledge graph

  • graph.html — interactive visualizer (opens in browser)

  • GRAPH_REPORT.md — god nodes, surprising connections, architecture summary

  • cost_report.json — exactly how much you saved vs naive scanning


VS Code / Cursor

Install the extension from the Marketplace:

ext install pruvalex.pruvagraph

Or press Ctrl+Shift+PPruvaGraph: Build Graph.

The extension adds:

  • Sidebar panel — live graph viewer, cost meter, god-node list

  • Inline hovers — hover any function to see its connections

  • Status bar — graph freshness indicator + total cost saved

  • Auto-rebuild — watches for file changes, rebuilds incrementally


Claude Code

PruvaGraph installs as an MCP server so Claude Code can query your codebase graph directly:

pruvagraph install --claude-code

This adds the MCP server to your Claude Code config. Then in Claude Code:

/graph "how does UserService connect to the database?"
/graph "what are the top 5 god nodes in this repo?"
/graph "show me all callers of processPayment()"
/graph "what would break if I deleted AuthMiddleware?"

Claude Code reads the compact graph.json instead of opening files one by one — 5x–71x fewer tokens per query depending on repo size.


Languages

PruvaGraph uses tree-sitter for local AST extraction (no LLM, no cost):

Category

Languages

Web

TypeScript, TSX, JavaScript, JSX, Vue, Svelte, Astro, CSS, HTML

Backend

Python, Go, Rust, Java, C#, PHP, Ruby, Elixir, Scala

Mobile

Kotlin, KTS, Swift, Dart (Flutter), Objective-C

Systems

C, C++, Zig

Data/Infra

SQL, YAML, Terraform/HCL, Dockerfile, Bash

Other

Lua, Julia, Haskell, OCaml, R, Fortran

Docs (.md, .pdf, .docx) and images use LLM extraction — this is the only part that costs money, and PruvaGraph minimizes it aggressively.


Cost reduction — how it works

Layer 1: SHA-256 hash cache

Every extracted file is fingerprinted. Re-runs skip unchanged files entirely. Zero API calls for files you haven't touched.

Layer 2: Semantic MinHash dedup

Before sending a batch to the LLM, PruvaGraph computes MinHash signatures and groups similar files (Jaccard ≥ 0.82). Only one representative per group gets extracted — results are projected back to the rest. Useful when you have 40 similar React components or 20 similar API route handlers.

Layer 3: Smart batch packing

Files destined for LLM extraction are packed into batches by token count (default: 12,000 tokens/batch). One LLM call handles multiple files. Graphify sends each doc as a separate call; PruvaGraph fits as many as possible into each call.

Layer 4: 3-tier LLM cascade (optional)

Set --cascade to route files through:

  1. Local (Ollama) — for simple docs, zero cost

  2. Cheap cloud (Gemini Flash, Kimi K2, GPT-4.1-mini) — for medium complexity

  3. Premium cloud (Claude Sonnet) — only when confidence is low

Most files resolve at tier 1 or 2. Claude Sonnet is reserved for complex cases.

Cost budget

Set a hard spending cap:

pruvagraph . --budget 2.00   # stop at $2.00 of LLM spend
pruvagraph . --dry-run       # estimate cost before spending anything

Backends

# Default: Claude (ANTHROPIC_API_KEY)
pruvagraph .

# Cheaper alternatives (same graph quality)
pruvagraph . --backend gemini    # GEMINI_API_KEY
pruvagraph . --backend kimi      # MOONSHOT_API_KEY  
pruvagraph . --backend openai    # OPENAI_API_KEY

# Free (local Ollama — needs ollama running)
pruvagraph . --backend ollama

# Cascade mode (local → cheap → premium)
pruvagraph . --cascade

Claude Code / Cursor / VS Code integration files

PruvaGraph writes integration files automatically:

pruvagraph install              # auto-detect IDE
pruvagraph install --vscode     # CLAUDE.md + .vscode/settings.json
pruvagraph install --cursor     # .cursor/rules/pruvagraph.mdc
pruvagraph install --claude-code # MCP server config

Advanced

# Incremental update (changed files only)
pruvagraph . --update

# Focus on specific directory
pruvagraph ./src

# Export formats
pruvagraph export --format cypher   # Neo4j import
pruvagraph export --format obsidian # Obsidian vault
pruvagraph export --format graphml  # yEd / Gephi

# Cost report
pruvagraph cost-report

# Benchmark: tokens saved vs reading raw files
pruvagraph benchmark

# Git hook (auto-update on commit)
pruvagraph hook install

How it compares to Graphify

PruvaGraph started as a fork of the ideas in Graphify and rebuilt the cost layer from scratch.

Feature

Graphify

PruvaGraph

Core pipeline

✅ 3-pass

✅ 3-pass + cascade

SHA-256 cache

Semantic MinHash dedup

✅ basic

✅ improved (Jaccard threshold tunable)

Smart batch packing

❌ fixed batches

✅ token-aware packing

3-tier LLM cascade

✅ local → cheap → premium

Cost budget cap

--budget 2.00

Cost analytics

cost_report.json

VS Code extension

✅ Marketplace

MCP server (rich)

✅ basic

✅ 8 tools

Dry-run cost estimate

--dry-run


Contributing

PruvaGraph is MIT licensed and welcomes contributions. The best ways to contribute:

  • Add a language extractor — see packages/core/pruvagraph/extract/ and CONTRIBUTING.md

  • Improve dedup thresholdspackages/core/pruvagraph/dedup.py

  • Add an LLM backendpackages/core/pruvagraph/backends/

  • VS Code extension featurespackages/vscode/

See CONTRIBUTING.md for setup instructions.


About PRUVALEX

PruvaGraph is built and maintained by PRUVALEX.

PRUVALEX builds compliance tools for enterprise software teams. PruvaGraph is our open-source contribution to the developer community — a tool we built because we needed it ourselves and thought others would too.


License

MIT © 2026 PRUVALEX

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PRUVALEX-Systems/pruvagraph'

If you have feedback or need assistance with the MCP directory API, please join our Discord server