knowing
OfficialExtracts routes, handlers, and dependencies from Actix web applications, enabling querying of callers and blast radius.
Extracts selectors, custom properties, and var() dependencies from CSS/SCSS files for dependency analysis.
Extracts routes, views, and relationships from Django projects, enabling call graph and dependency queries.
Extracts services, ports, networks, and depends_on links from Docker Compose files for infrastructure graph.
Extracts routes and handlers from Express.js applications, supporting route-to-handler mapping and blast radius.
Extracts routes, dependencies, and handlers from FastAPI applications for API-level dependency analysis.
Extracts routes and handlers from Fastify applications, enabling route and dependency queries.
Extracts routes and views from Flask applications, providing route-to-function mapping and dependency graphs.
Extracts routes and handlers from Gin web applications for route-level dependency and blast radius analysis.
Watches Git repositories for changes, enabling incremental re-extraction and snapshot history of code relationships.
Extracts workflows, jobs, steps, and action references from GitHub Actions YAML for CI/CD dependency analysis.
Extracts resources, data sources, modules, and variables from Terraform HCL files for infrastructure dependency analysis.
Extracts routes and handlers from Hono applications, enabling route-level dependency queries.
Extracts symbols, function calls, imports, and references from JavaScript code for semantic dependency graphs.
Extracts deployments, services, configmaps, and label-selector edges from Kubernetes YAML for infrastructure graph.
Extracts controllers, providers, and modules from NestJS applications for module-level dependency analysis.
Extracts controllers, routes, and dependencies from ASP.NET Core applications for .NET codebase analysis.
Extracts routes, pages, and API handlers from Next.js applications for full-stack dependency analysis.
Ingests OpenTelemetry runtime traces to add runtime-observed edges to the graph, enabling static-vs-runtime comparison.
Extracts symbols, function calls, imports, and class hierarchies from Python code for dependency graphs.
Extracts publish/subscribe relationships from codebases using RabbitMQ for message-level dependency analysis.
Extracts routes and handlers from Rocket web applications for Rust codebase dependency analysis.
Extracts symbols, function calls, imports, and trait implementations from Rust code for semantic dependency graphs.
Extracts functions, events, and resource references from Serverless Framework YAML for serverless infrastructure analysis.
Extracts controllers, endpoints, and dependencies from Spring Boot applications using annotation scanning.
Extracts resources, data sources, modules, and variables from Terraform configurations for infrastructure dependency graphs.
Extracts symbols, types, imports, and dependencies from TypeScript code for semantic graphs with type information.
Extracts structure and cross-references from various YAML formats (K8s, CloudFormation, Docker Compose, etc.) for infrastructure graphs.
Research paper: Content-Addressing as a Computation Primitive for Software Relationship Intelligence (DOI: 10.5281/zenodo.20342255)
Your architecture diagram says service A calls service B. Can you prove it?
knowing can. It builds a content-addressed graph of extracted code relationships, snapshots it as a Merkle tree tied to a git commit, and generates cryptographic proofs that verify offline. Agents use it for ranked context. Security teams use it for audit. Platform teams use it to compare code against production traces.
It gets better every time you use it. When code changes, stale knowledge expires automatically.
brew install blackwell-systems/tap/knowing{ "mcpServers": { "knowing": { "command": "knowing", "args": ["mcp", "--watch"] } } }That's it. The MCP server auto-indexes your repo on first launch. Your agent now has ranked context (one call replaces grep-read loops), blast radius, test scope, and memory that compounds.
Three Things, One Architecture
knowing is three products built on one foundation (content-addressed graph with hierarchical Merkle trees):
1. Context engine for AI agents One call returns the most relevant symbols for a task, ranked by graph centrality, recency, and learned usefulness, packed to fit your token budget. 47% fewer tool calls. 84% fewer tokens. Results improve with feedback.
2. Audit primitive for compliance
Every graph state is a Merkle root tied to a git commit. knowing prove generates a cryptographic proof that a relationship existed. knowing verify checks it offline. knowing fsck verifies the entire graph in 98ms.
3. Memory layer that learns Feedback from agents compounds across sessions. When code changes, feedback expires automatically (verified via package Merkle roots). The system gets smarter over time, not noisier. That is the property knowing is built around.
These aren't separate features. They're structural consequences of content-addressing: the same hash that makes context cacheable also makes it provable, and the same Merkle root that detects staleness also expires stale feedback.
What It Answers
For your agent:
"I'm changing this function. What breaks?" (blast radius across callers, tests, routes, repos)
"Give me 5,000 tokens of context for this task." (graph-ranked, not grep-searched)
"Which tests should run?" (call-graph traversal, 98% precision)
For your platform team:
"Is this route used in production?" (static analysis + OTel runtime traces)
"What did the service graph look like at a specific snapshot?" (snapshot chain, each root tied to a git commit)
For your security team:
"Prove service A calls service B at this commit." (Merkle proof, verifiable offline)
"Prove this dependency does NOT exist." (absence proof via sorted leaves)
"Generate a compliance report." (
knowing audit -proofs, one command)
Numbers
What | Result |
Agent context precision | +20pp after 1 round, +34pp after 5 |
Tool calls saved | 47% fewer (one context call replaces repeated grep+read) |
Token savings | 84% fewer tokens (GCF wire format) |
Repeat query speed | 93x faster (Merkle-keyed subgraph cache) |
Merkle diff | 517x faster than full edge scan at 100K edges |
Test scope | 98% precision, 82% recall |
Graph integrity check | 98ms (24,936 edges) |
Proof generation | 72us generate, 1.2us verify |
Feedback expiration | 100% expire on code change, 11% overhead |
Cross-repo retrieval | 46.7% R@10 on foreign codebase, zero config |
Cross-system retrieval | P@10=0.226 vs grep P@10=0.020 (11.3x, p<0.0001, d=0.92) |
Indexing throughput | 5 repos (47,150 files) in ~52s |
All benchmarks are reproducible: GOWORK=off go test ./bench/... -timeout 5m
Quick Start
# Install
brew install blackwell-systems/tap/knowing
# Or: go install github.com/blackwell-systems/knowing/cmd/knowing@latest
# Or: npm install -g @blackwell-systems/knowing
# Or: pip install knowing
# That's it. Add the MCP config and start a session.
# The server auto-indexes your repo on first launch.
# Or index manually for CLI usage:
knowing add .
# Remove a repo (evicts all data: nodes, edges, snapshots, feedback)
knowing remove ./path/to/repo
# Get context for a task
knowing context -task "refactor auth middleware" -format gcf
# Find affected tests
knowing test-scope -files internal/auth/middleware.go
# Explain why a symbol ranked where it did
knowing why -task "refactor auth" -symbol "SessionHandler"
# Prove a relationship exists (cryptographic Merkle proof)
knowing prove -source "AuthService" -target "SessionStore"
# Verify offline (no database needed)
knowing verify proof.json
# Check graph integrity
knowing fsck
# Check if the graph is stale (CI gate: exits 1 if stale)
knowing staleMCP Integration
{
"mcpServers": {
"knowing": {
"command": "knowing",
"args": ["mcp", "--watch"],
"transport": "stdio"
}
}
}The --watch flag re-indexes on file changes. Your agent always queries fresh data. No manual knowing index or database path needed: the MCP server auto-indexes the git repository on first launch and registers it in the roster for future sessions.
For HTTP transport (multi-agent, daemon mode):
knowing serve -addr :8100 .{
"mcpServers": {
"knowing": {
"url": "http://localhost:8100",
"transport": "streamable-http"
}
}
}Why This Works
Git versions files. knowing versions the understanding of code.
The entire system is built on one idea: content-addressed identity. Every symbol, relationship, and snapshot is SHA-256 hashed. This single choice gives you:
Staleness detection for free. Changed file = new hash = stale edges are known without scanning.
Caching for free. Same package root = same results. 93x speedup on unchanged queries.
Integrity for free. Verify all stored hashes and snapshot chain continuity. 98ms.
History for free. Each snapshot is a Merkle root tied to a git commit. Walk the chain.
Feedback expiration for free. Feedback stores the package Merkle root. Code changes = root changes = old feedback is invisible.
Proofs for free. Merkle path from leaf to root is a self-contained cryptographic proof.
Git | knowing | |
What it versions | File contents | Code relationships and their meaning |
Unit of storage | blob | node + edge + provenance + confidence |
Identity |
|
|
Snapshot | tree of blobs | Hierarchical Merkle: repo -> package -> edge-type -> leaf |
Diff | Which lines changed | Which packages changed, what broke, what's new |
History | What code looked like | What the codebase understood about itself |
How It Works
+------------------------------------------------------------------+
| knowing daemon |
+----------------+------------------------+--------------------------+
| Indexer | Graph Store | MCP Server |
| | | |
| 26 extractors | Content-addressed | 28 tools + 8 resources |
| tree-sitter | SQLite + Merkle tree | stdio / HTTP (1.8s index)|
| LSP + SCIP | Hierarchical snapshots | GCF / GCB / JSON |
| OTel traces | Subgraph cache (93x) | PackRoot dedup (99%) |
| | Community detection | |
+----------------+------------------------+--------------------------+Two planes:
Execution: indexes repos, extracts symbols and relationships, ingests traces, stores snapshots.
Intelligence: computes blast radius, context packs, test scope, feedback, communities from the stored graph.
The boundary matters: intelligence features read the graph and produce derived results. They cannot corrupt graph facts. A bad ranking produces a bad recommendation; it cannot invalidate a proof.
Capabilities
Languages And Formats
Language/Format | Extractor | Framework/Pattern Detection |
Go | tree-sitter + | net/http, gin, echo, chi, gorilla/mux |
TypeScript/JavaScript | tree-sitter | Express.js, Fastify, Hono, NestJS, Next.js |
Python | tree-sitter | Flask, FastAPI, Django |
Rust | tree-sitter | Actix, Axum, Rocket |
Java | tree-sitter | Spring annotations |
C# | tree-sitter | ASP.NET attributes |
Protocol Buffers | tree-sitter | service, message, enum, RPC declarations |
Terraform (HCL) | tree-sitter | resource, data, module, variable declarations |
SQL | tree-sitter | tables, views, functions, procedures, FK edges |
Kubernetes YAML | yaml.v3 | deployments, services, configmaps, label-selector edges |
CloudFormation/SAM | yaml.v3 | resources, !Ref/!GetAtt/!Sub cross-references |
Docker Compose | yaml.v3 | services, ports, networks, depends_on links |
GitHub Actions | yaml.v3 | workflows, jobs, steps, action references |
Serverless Framework | yaml.v3 | functions, events, resource references |
CSS/SCSS | tree-sitter | selectors, custom properties, var() dependencies |
Event/MQ patterns | multi-language | Kafka, NATS, SQS, RabbitMQ publish/subscribe |
OpenAPI/JSON Schema | json/yaml | endpoints, models, $ref resolution |
Dockerfile | parser | FROM base images, COPY --from multi-stage deps, EXPOSE ports |
Makefile | parser | target dependencies, include directives, variable references |
Helm Charts | yaml.v3 | chart dependencies, template references, values injection |
GitLab CI | yaml.v3 | job needs, extends templates, include files, artifacts |
package.json (npm) | json | dependencies, devDependencies, peerDependencies, scripts |
GraphQL | parser | type definitions, field type references, interface implementations |
Ruby | tree-sitter | classes, modules, method definitions, require edges |
.env files | parser | environment variable declarations, cross-file references |
All extractors fire per file via multi-dispatch; results are merged. Tree-sitter produces edges at confidence 0.7 (ast_inferred); go/packages and SCIP at 0.95-1.0 (ast_resolved, scip_resolved).
MCP Tools
Tool | Purpose |
| Build and inspect the graph |
| Understand impact and paths |
| Compare graph states and review changes |
| Query runtime-observed relationships |
| Ranked context for agents |
| Route work, query code owners/authors, select tests, improve ranking |
| Cryptographic proofs, absence proofs, integrity verification |
| Evict all data for a repository (nodes, edges, files, snapshots, feedback, task memory, graph notes) |
MCP prompts: refactor_safely, review_pr, investigate_dead_code.
MCP Resources
8 read-only resources for agent orientation without a tool call:
Resource | What it returns |
| Graph size, top kinds, hotspot count, snapshot age |
| Node kinds, edge types, provenance tiers, hash format |
| Counts by repo, kind, and edge type |
| All tracked repos with counts and last-indexed time |
| Context calls, symbols served, cache hits/misses, uptime |
| Healthy/stale/corrupted status, integrity check |
| Community list with cohesion and Merkle roots |
| Single community detail (resource template) |
Wire Formats
Format | Purpose | Savings vs JSON |
GCF (Graph Compact Format) | LLM consumption: line-oriented, positional fields | 84% fewer tokens |
GCB (Graph Compact Binary) | Service transport and caching: varint, length-prefixed | 74% fewer bytes |
JSON | Human debugging, generic consumers | Baseline |
GCF uses |-separated fields and local IDs ($1 -> $3) instead of repeated qualified names. Parseable by LLMs while fitting 5x more graph context into the same token budget. Session-stateful deduplication reduces repeated symbols by 47%.
Current Boundaries
Breaking hash change (v0.3.0): Hash domain prefixes added. Databases from before v0.3.0 must be re-indexed. Run
knowing fsckafter.Static blast radius follows
callsedges; other edge types provide context, not traversal.Runtime tools require OpenTelemetry trace ingestion; without traces they have no observations.
LSP enrichment: Go, TypeScript, Python, Rust, Java, C#. Auto-detected from project markers. Others fall back to tree-sitter.
Documentation
Doc | Contents |
System design, schemas, content addressing, daemon model | |
Implementation inventory, entry points, limitations | |
Merkle proofs, fsck, snapshot chain, CI gates | |
Commands, flags, examples | |
Tool schemas, parameters, return formats | |
Relationship semantics and provenance | |
RWR, HITS, ranking, token budgeting | |
OTel ingestion and runtime confidence | |
GCF, GCB, JSON formats and benchmarks | |
Completed workstreams and next priorities | |
Reproducible value benchmarks with performance contracts | |
Hierarchical Identity Architecture thesis (DOI: 10.5281/zenodo.20342255) | |
Claude Code hook integration |
License
MIT
This server cannot be installed
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/blackwell-systems/knowing'
If you have feedback or need assistance with the MCP directory API, please join our Discord server