knot
The knot server enables AI agents to intelligently explore, search, and navigate large codebases using semantic vector search and graph-based structural analysis.
Semantic & Structural Code Search (
search_hybrid_context): Find code by natural language queries (e.g., "user authentication"), combining vector embeddings for semantic similarity with graph analysis for architectural relationships — returns file paths, line numbers, signatures, docstrings, and cross-repository dependencies.Reverse Dependency Lookup (
find_callers): Identify all code that calls, extends, implements, or references a specific function, method, or class — useful for impact analysis before refactoring, dead code detection, and understanding call chains. Supports exact names or signature fragments (e.g.,handle(Request).File Structure Inspection (
explore_file): Get a structural outline of any source file — lists all classes, interfaces, methods, and properties with signatures, docstrings, and line numbers, without reading the entire file.Multi-language support: Java, Kotlin, and TypeScript/JavaScript/Node.js for core code intelligence; HTML and CSS/SCSS also indexed.
Multi-repository filtering: All tools support an optional
repo_nameparameter to scope results to a specific indexed repository.Read-only operations: All tools are purely read-only with no side effects on the codebase or databases.
AI Agent Integration: Exposes capabilities via MCP (Model Context Protocol), allowing AI clients (Claude, Gemini, ChatGPT, etc.) to leverage knot for autonomous code analysis.
Supports indexing and analysis of Angular web components through HTML parsing, enabling cross-language linking between JavaScript, HTML, and CSS for full-stack SPA analysis.
Provides CSS/SCSS stylesheet indexing with class/ID selector extraction and variable tracking, enabling unified HTML/CSS discovery and cross-language search capabilities.
Supports CommonJS TypeScript file analysis as part of complete TypeScript/TSX/CTS language support for modern JavaScript/TypeScript codebases.
Provides Docker-based deployment options for universal compatibility across platforms, including containerized execution of the indexer, CLI tool, and MCP server.
Supports configuration via .env files for setting repository paths and database credentials during codebase indexing and analysis.
Provides comprehensive JavaScript/Node.js analysis including vanilla JS, Node.js, and module systems (.js, .mjs, .cjs, .jsx) with full cross-language linking capabilities.
Provides complete Kotlin codebase analysis with support for classes, interfaces, objects, companion objects, functions, methods, and properties using tree-sitter-kotlin-ng grammar.
Uses Markdown for documentation and skill files, including .knot-agent.md for teaching LLMs how to use the CLI tool for autonomous code analysis.
Integrates with Neo4j graph database for storing architectural relationships via call graphs, enabling structural navigation and reverse dependency analysis.
Supports Node.js module systems and JavaScript analysis as part of the hybrid web ecosystem with cross-language linking capabilities.
Extracts id and className attributes from JSX/TSX React components for unified HTML/CSS discovery and cross-language analysis in web applications.
Provides complete TypeScript/TSX analysis including modern JavaScript/TypeScript codebases with full cross-language linking and architectural relationship extraction.
knot
knot is a high-performance codebase indexer that extracts structural and semantic information from source code, enabling AI agents to understand, analyze, and navigate large code repositories. Currently supports Java, Kotlin (v0.7.4+), TypeScript, JavaScript/Node.js, Rust (v0.8.x), Python (v0.9.3), Groovy (v0.10.3), C/C++ (v1.0.0), HTML, and CSS/SCSS, plus Build Systems (Maven pom.xml, Gradle build.gradle, Jenkins pipeline, Cargo.toml — v1.2.0), Configuration Files (YAML, JSON, .properties — v1.2.0), and Kubernetes + Helm (v1.2.0) with full cross-language linking.
The indexer automatically builds:
Vector Search Database (Qdrant) — semantic understanding via embeddings
Graph Database (Neo4j) — architectural relationships via call graphs
This dual-database approach powers both:
MCP (Model Context Protocol) Server — Exposes three tools to any LLM client (Claude, Gemini, ChatGPT, Cursor, etc.)
CLI Tool (v0.10.1) — Standalone
knotcommand for terminal and scripting environments
✨ Key Features
🔍 Code Intelligence Tools
search_hybrid_context: Semantic + structural search. Find code by meaning, class name, method signature, docstrings, or comments. Returns full context including dependencies.find_callers: Reverse dependency lookup. Identify dead code, perform impact analysis, or understand the full call chain of any function/method. When multiple entities share the same name (e.g.,find_nearest_entity_by_linein different files), results are automatically grouped by target showing which specific entity each caller references.explore_file: File anatomy inspection. Quickly see all classes, interfaces, methods, and functions in a file with signatures and documentation.
🏗️ Multi-Language Support
Java: Full AST extraction with package awareness
Kotlin (v0.7.4+): Complete support for Kotlin codebases with classes, interfaces, objects, companion objects, functions, methods, and properties. Fully compatible with tree-sitter-kotlin-ng grammar.
TypeScript/TSX/CTS: Complete support for modern JavaScript/TypeScript codebases, including CommonJS TypeScript files
JavaScript/Node.js (v0.7.4+): Vanilla JS, Node.js, and module systems (
.js,.mjs,.cjs,.jsx)Hybrid Web Ecosystem (v0.6.5): Cross-language linking between JavaScript, HTML, and CSS for full-stack SPA analysis
HTML (v0.6.3+): Custom elements (Web Components, Angular),
idandclassattribute indexing for cross-language CSS searchJSX/TSX Attributes (v0.6.3+): Extracts
idandclassNamefrom React components for unified HTML/CSS discoveryCSS/SCSS (v0.6.4+): Stylesheet indexing with class/ID selector extraction and variable tracking (CSS/SCSS variables, mixins, functions)
Rust (v0.8.11): Struct, enum, union, trait, function, method, module extraction with trait implementation tracking (IMPLEMENTS relationships) and macro invocation references. NEW in v0.8.6: Type alias, constant, static, and macro definition extraction with full docstring and signature support. NEW in v0.8.7: Enhanced type reference detection inside macros (
vec![],println!(),assert!(), etc.) with intelligent string literal filtering and comprehensive edge case handling. NEW in v0.8.11: O(N) nested macro traversal optimization for large Rust codebases with deeply nestedtoken_treenodes.Python (v0.9.3): Full Python extraction with class, function, method support, constants, module-level imports,
ValueReferencetracking for keyword arguments, class inheritance (EXTENDS), decorator extraction (@property,@staticmethod,@route(...),@dataclass), generic type hints (List[str],Optional[Dict],*args/**kwargs), Py2/Py3 exception syntax compatibility, andself.method()resolution with inherited method walking. Capturesclass_definition,function_definition(including async via optionalasyncmodifier), lambda assignments, and distinguishes methods from functions via parent context detection.Groovy (v0.10.3): Full Groovy language support via hybrid tree-sitter + ad-hoc lexical parser. Extracts classes, interfaces, traits, enums, typed/
def/quoted methods (incl. Spock specs), constructors, closures, script-level variables, fields/properties with visibility modifiers, nested classes, and decorators. Tracks package FQN and enclosing class relationships. NEW in v0.10.3: Multi-line signatures (closure default params), assignment-vs-declaration disambiguation, innermost assignment for nested closures, UUID collision fix for duplicate method names,find_callersaccurately tracks private methods including those in anonymousnew AnActionclosures.Build Systems (v0.10.0): Maven
pom.xml(dependencies + plugins via roxmltree), Gradlebuild.gradle(deps + plugins + tasks), andJenkinsfilepipeline (stages + steps) extraction.Cargo.toml (v1.2.0): Rust package manager support with package metadata, features, workspace members, and multi-format dependency parsing (simple, table, git, path).
Configuration Files (v1.2.0): YAML (.yml/.yaml), JSON (.json), and Java Properties (.properties) with leaf-key granularity. Special handling for package.json (npm dependencies as BuildDependency, scripts as ConfigProperty).
Kubernetes + Helm (v1.2.0): K8s manifest parsing (Deployment, Service, ConfigMap, Secret, Ingress, Namespace) with label/annotation tracking and cross-resource references. Helm chart indexing (Chart.yaml metadata, values.yaml key-value pairs, template variable extraction via {{ .Values.X }}).
C/C++ (v1.0.0): Complete C/C++ support with namespace-aware FQN resolution (
Engine::MyClass::start), class/struct extraction, function/method tracking, macro definition and usage detection (uppercase identifier heuristic), type reference tracking (declarations,newexpressions), and full call graph analysis. Supports.c,.h,.cpp,.hpp,.cc,.cxx,.hh,.hxxextensions via tree-sitter-c and tree-sitter-cpp parsers. Includes intelligent auto-detection for.hheaders to parse them correctly as C or C++ based on their contents.
📚 Rich Comment Extraction
Captures docstrings (JavaDoc, JSDoc) preceding declarations
Extracts inline comments within method/function bodies
Respects nesting boundaries (class comments don't capture method comments)
Intelligently aggregates comment blocks
📊 Dual-Database Architecture
Qdrant: Vector search for semantic code understanding
Neo4j: Graph relationships for structural navigation
🚀 High Performance
Parallel Streaming Pipeline: Overlaps CPU-bound embedding with I/O-bound ingestion via MPSC channels (v0.5.0+)
Incremental Indexing: Uses SHA-256 hashes to skip unchanged files
Real-time Watch Mode: Automatically re-indexes changed files in seconds via
--watchCPU Parallelism: AST extraction via Rayon
Scalable: Configurable batch processing and constant memory footprint (~2GB) regardless of repository size
Performance Benchmarking (v1.1.0+): Three-level validation framework
Unit benchmarks: Criterion-based benchmarks for parse, embed, and graph write throughput (
benches/)E2E benchmarks: Full pipeline metrics capture with per-stage timing (
tests/benchmark_e2e.sh)CI regression tracking: Automated baseline comparison against tolerance thresholds (
scripts/compare_perf_metrics.sh)
Related MCP server: MCP-RAG
🛠️ Installation
Prerequisites
Component | Version | Notes |
Docker | 20.10+ | For running Qdrant and Neo4j |
qdrant | 1.x | Vector database (docker) |
neo4j | 5.x | Graph database (docker) |
Option A: Pre-compiled Binaries (macOS & Modern Linux)
Go to the Releases page and download the native executable for your platform.
Install knot binaries (CLI, MCP server, and indexer):
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/raultov/knot/releases/latest/download/knot-installer.sh | shDownload agent-skills guides separately (optional):
curl -sO https://raw.githubusercontent.com/raultov/knot/master/.knot-agent.md && curl -fsSL https://raw.githubusercontent.com/raultov/knot/master/.knot-agent-skills.tar.gz | tar -xzThe first command installs the knot binary to your PATH. The second (optional) downloads the agent skill index (.knot-agent.md) and extracts comprehensive guides for using knot CLI with AI agents and code analysis tools.
Linux Requirements:
Full install (knot-indexer + CLI + MCP): glibc 2.38+
Ubuntu 24.04 LTS or later
Debian 13 (Trixie) or later
Fedora 39+ / RHEL 10+
Arch Linux (rolling release)
Lightweight clients-only (knot CLI + MCP server, no indexing): glibc 2.35+ (even older systems like Debian 12 Bookworm work fine)
For older Linux distributions or Windows, see the Lightweight Clients section below or use Docker (Option B).
Option B: Docker (Universal Compatibility)
Docker images provide universal compatibility for any Linux distribution and Windows.
Full Install (All Binaries: knot-indexer, knot CLI, knot-mcp)
Build the image:
docker build -t knot:latest . --network=hostRun the indexer:
# Use --network host to connect to databases running on your host machine
docker run --rm \
-v /path/to/your/repo:/workspace \
-e KNOT_REPO_PATH=/workspace \
-e KNOT_NEO4J_PASSWORD=your-password \
--network host \
knot:latest \
knot-indexerRun the CLI tool:
docker run --rm \
-v /path/to/your/repo:/workspace \
-e KNOT_REPO_PATH=/workspace \
-e KNOT_NEO4J_PASSWORD=your-password \
--network host \
knot:latest \
knot search "user login flow"Run the MCP server:
docker run --rm \
-e KNOT_REPO_PATH=/workspace \
-e KNOT_NEO4J_PASSWORD=your-password \
--network host \
knot:latest \
knot-mcpNote: Uses Debian Trixie (glibc 2.38+) and includes ONNX Runtime for full functionality.
Lightweight Clients (Only knot CLI + knot-mcp, No Indexer)
For older systems (Debian 12 Bookworm, Ubuntu 22.04) or production deployments that only need to query existing indexes without indexing new code:
Build the lightweight image:
docker build -t knot:clients -f Dockerfile.clients . --network=hostImage size: ~100MB (vs ~160MB for full install)
Run the CLI tool (query existing index):
docker run --rm \
--network host \
knot:clients \
knot callers "MyClass"Run the MCP server:
docker run --rm \
--network host \
knot:clients \
knot-mcpAvailable tools in lightweight mode:
✅
knot search(structural only, no semantic search)✅
knot callers(reverse dependency lookup)✅
knot explore(file structure inspection)❌ Semantic search requires the full install
Note: Uses Debian Bookworm (glibc 2.35+) and excludes ONNX Runtime, making it compatible with older Linux distributions.
Option C: Install via Cargo
cargo install --git https://github.com/raultov/knotOption D: Build from Source
Full Install (All Binaries):
1. Start infrastructure with Docker:
docker compose up -d2. Clone and build:
git clone https://github.com/raultov/knot
cd knot
cargo build --release3. Configure:
cp .env.example .env
$EDITOR .env # Set KNOT_REPO_PATH and Neo4j credentials4. Index a codebase:
./target/release/knot-indexer5. Query via CLI:
./target/release/knot search "your query"Option E: Lightweight Clients (No Indexing)
For older Linux distributions (e.g. Debian 12 Bookworm, Ubuntu 22.04) or production deployments where you only need the CLI and MCP server (not the indexer), compile without the embedding dependencies:
Build lightweight clients:
cargo build --release --no-default-features --features only-clientsThis produces only knot and knot-mcp binaries (~8-10 MB each), excluding the 30+ MB of ONNX Runtime dependencies.
Available tools in lightweight mode:
✅
find_callers: Reverse dependency lookup (graph search). Automatically groups results by target when multiple entities share the same name.✅
explore_file: File structure inspection❌
search_hybrid_context: Semantic search (requires embeddings, not available in this mode)
Use case: Query an existing Qdrant + Neo4j index that was built elsewhere, without needing the indexer on your machine.
Docker alternative (for lightweight mode):
docker build -t knot:clients-only -f Dockerfile -f - . << 'EOF'
FROM rust:1.90-slim-bookworm AS builder
WORKDIR /build
COPY . .
RUN cargo build --release --no-default-features --features only-clients
FROM debian:bookworm-slim
COPY --from=builder /build/target/release/knot* /usr/local/bin/
CMD ["knot-mcp"]
EOF6. Start the MCP server:
./target/release/knot-mcp📖 Usage
📥 Quick Downloads
Download knot binaries (CLI + MCP server):
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/raultov/knot/releases/latest/download/knot-installer.sh | shDownload agent-skills documentation (index + all guides):
curl -sO https://raw.githubusercontent.com/raultov/knot/master/.knot-agent.md && curl -fsSL https://raw.githubusercontent.com/raultov/knot/master/.knot-agent-skills.tar.gz | tar -xz📖 Agent-Skills Guides
Comprehensive documentation for using knot tools. The download above extracts:
search.md — Semantic code discovery guide with examples
callers.md — Reverse dependency lookup with critical usage rules
explore.md — File anatomy inspection guide
workflows.md — Common patterns and best practices
For quick reference without downloading, see .knot-agent.md.
Using the CLI (v0.8.0+)
The knot CLI provides the same capabilities as the MCP server via command-line commands, making it ideal for:
Terminal-only environments
Bash scripting and automation
CI/CD pipelines
Direct integration with other tools
Three main commands:
knot search — Semantic Code Search
knot search "user authentication" --max-results 10 --repo my-appFind code entities by meaning, class names, docstrings, or comments.
knot callers — Reverse Dependency Lookup
knot callers "LoginService" --repo my-appFind all code that references a specific entity (dead code detection, impact analysis, call chains). When multiple entities share the same name in different files, results are automatically grouped by target with file locations and signatures.
knot explore — File Structure Inspection
knot explore "src/services/auth.ts" --repo my-appList all classes, methods, functions in a file with signatures and documentation.
For detailed CLI usage guide, see .knot-agent.md — a machine-readable skill that teaches LLMs how to use knot CLI for autonomous code analysis.
Indexing a Codebase
Incremental Indexing (Default, v0.4.3+)
# First run: indexes all files
knot-indexer --repo-path /path/to/your/repo --neo4j-password secret
# Subsequent runs: only re-indexes changed files (fast!)
knot-indexer --repo-path /path/to/your/repo --neo4j-password secret
# NEW: Real-time Watch mode (v0.5.2+)
knot-indexer --watch --repo-path /path/to/your/repo --neo4j-password secretHow it works:
Tracks file content via SHA-256 hashes in
.knot/index_state.jsonAutomatically detects: modified, added, and deleted files
Only re-parses and re-embeds changed files
Preserves graph relationships to unchanged files
Processes entities in memory-efficient 512-entity chunks
Performance:
Initial index (3800 files): ~60 minutes on standard hardware
Incremental update (3 files changed): ~5-10 seconds
Memory usage: Constant ~2GB regardless of repository size
Full Re-Index (Clean Mode)
# Force complete re-index (deletes all existing data)
knot-indexer --clean --repo-path /path/to/your/repo --neo4j-password secretUse --clean when:
You want to rebuild the entire index from scratch
You've changed Tree-sitter queries or embedding models
Troubleshooting indexing issues
Running E2E Integration Tests
To ensure indexer stability, run the E2E integration test suite:
# Run all language E2E tests (Java, TS, JS, HTML, CSS, Kotlin, Rust)
./tests/run_e2e.sh
# Run only Kotlin E2E tests
./tests/run_kotlin_e2e.sh
# Run only Rust E2E tests
./tests/run_rust_e2e.shSee tests/KOTLIN_E2E_TESTS.md for detailed coverage and troubleshooting.
Using the MCP Server
The MCP server exposes three tools to any compatible AI client:
Tool 1: search_hybrid_context
Find code by meaning or keywords
Query: "How is user authentication implemented?"
Result: All auth-related code, signatures, docstrings, and dependenciesCapabilities:
Semantic search by functionality
Class/method/function name lookup
Docstring and inline comment search
Architectural pattern discovery
Full dependency context
Tool 2: find_callers
Find who calls a specific function
Query: "Find callers of getCurrentTimeInSeconds"
Result: All code that invokes this function + file locationsAdvanced: Search by Signature (NEW in v0.7.4)
# Find by full signature (Java)
echo '{"method":"tools/call","params":{"name":"find_callers","arguments":{"entity_name":"registerUser(String"}}}' | knot-mcp
# Find by parameter type (Kotlin)
echo '{"method":"tools/call","params":{"name":"find_callers","arguments":{"entity_name":"findById(Int"}}}' | knot-mcp
# Find by type annotation (TypeScript)
echo '{"method":"tools/call","params":{"name":"find_callers","arguments":{"entity_name":"(EventData"}}}' | knot-mcpUse Cases:
Dead Code Detection: Zero callers = unused code
Impact Analysis: "What breaks if I modify this?"
Refactoring Safety: Find all references before removing
Tool 3: explore_file
Understand file structure
Query: "What's in BrowserService.ts?"
Result: All classes, methods, and functions with signatures and docsUse Cases:
Quick file navigation
Module structure overview
Finding all methods in a class without reading line-by-line
🔗 MCP Client Configuration
Supported Clients
knot works with any MCP-compatible AI client:
✅ Claude Desktop (Anthropic)
✅ Gemini CLI (Google)
✅ ChatGPT CLI / GPT (OpenAI)
✅ Cursor (AI IDE)
✅ Any standard MCP client
Configuration Examples
Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"knot": {
"command": "/absolute/path/to/knot/target/release/knot-mcp",
"env": {
"KNOT_REPO_PATH": "/path/to/indexed/repo",
"KNOT_QDRANT_URL": "http://localhost:6334",
"KNOT_NEO4J_URI": "bolt://localhost:7687",
"KNOT_NEO4J_USER": "neo4j",
"KNOT_NEO4J_PASSWORD": "your-password"
}
}
}
}Gemini CLI
{
"mcpServers": {
"knot": {
"command": "/absolute/path/to/knot/target/release/knot-mcp",
"env": {
"KNOT_REPO_PATH": "/path/to/indexed/repo",
"KNOT_QDRANT_URL": "http://localhost:6334",
"KNOT_NEO4J_URI": "bolt://localhost:7687",
"KNOT_NEO4J_USER": "neo4j",
"KNOT_NEO4J_PASSWORD": "your-password"
}
}
}
}ChatGPT / GPT CLI
Similar JSON configuration in your client's MCP configuration file.
⚙️ Configuration Reference
All options can be set via environment variables (.env) or CLI flags. Environment variables take precedence.
| CLI Flag | Default | Description |
|
| (required) | Root directory of the repository to index |
|
| (auto-detected) | Repository name for multi-repo isolation (auto-detected from last path component) |
|
|
| Qdrant server URL |
|
|
| Qdrant collection name |
|
|
| Neo4j Bolt URI |
|
|
| Neo4j username |
|
| (required) | Neo4j password |
|
|
| Embedding vector dimension |
|
|
| Entities per batch |
|
|
| Force full re-index (delete all existing data) |
|
| (none) | Path to CA certificate bundle for corporate SSL proxies |
| (env only) |
| Log level: |
🎨 Custom Tree-sitter Queries
The built-in extraction queries (queries/java.scm, queries/typescript.scm) can be overridden without recompiling:
KNOT_CUSTOM_QUERIES_PATH=/path/to/my/queries ./target/release/knot-indexerPlace java.scm and/or typescript.scm in your custom directory. Missing files fall back to built-in defaults.
🔐 Corporate SSL / CA Certificates
In restricted corporate environments with SSL-inspecting proxies, you may need to provide a custom CA certificate bundle so that knot can download the embedding model from HuggingFace.
Via environment variable:
export KNOT_CUSTOM_CA_CERTS=/etc/ssl/certs/corporate-bundle.pem
./target/release/knot-indexer --repo-path /path/to/repo --neo4j-password secretVia CLI flag:
./target/release/knot-indexer \
--custom-ca-certs /etc/ssl/certs/corporate-bundle.pem \
--repo-path /path/to/repo \
--neo4j-password secretVia .env file:
echo "KNOT_CUSTOM_CA_CERTS=/etc/ssl/certs/corporate-bundle.pem" >> .env
./target/release/knot-indexerThis works for all three binaries: knot-indexer, knot-mcp, and knot.
🔄 Workflow Example
Step 1: Index a Java project
./target/release/knot-indexer --repo-path /home/user/my-java-app --neo4j-password secretStep 2: Query via CLI (Instant search)
./target/release/knot search "authentication logic"
./target/release/knot callers "UserService.login"Step 3: Start MCP server (For AI Agents)
./target/release/knot-mcpStep 4: Use with Claude Desktop
Claude will list the three tools in its Tools menu
Ask: "Search for all authentication logic"
Ask: "Find who calls the login method"
Ask: "Explore the structure of UserService.java"
🤖 Auto-Configuring AI Agents
knot includes a universal .prompt file in its root directory that automatically configures modern AI coding agents (Cursor, Cline, opencode, Claude, etc.) to use the knot-mcp tools correctly.
The directive explicitly instructs AI agents to prioritize:
search_hybrid_context— for semantic code discovery (instead ofgrep)find_callers— for reverse dependency analysis (instead of finding references manually)explore_file— for file structure inspection (instead of reading line-by-line)
This ensures that when you ask an AI agent to analyze, refactor, or understand your code, it leverages the full power of the vector and graph databases rather than falling back to context-blind regex searches. The .prompt file is universal and tool-agnostic, working with any LLM client that reads codebase directives.
🤝 Contributing
Contributions are welcome! Please ensure:
All code passes
cargo clippyCode is formatted with
cargo fmtChanges are compatible with Rust 2024 edition
All new functionality includes unit tests
Performance regressions are validated with the benchmark framework before submitting PRs
Performance Benchmarking
The project includes a three-level benchmarking framework to validate optimizations and detect regressions:
Level 1 — Unit Benchmarks (Criterion):
cargo bench --bench pipeline_bench # Parse + prepare throughput per language
cargo bench --bench graph_upsert_bench # Neo4j UNWIND batching speedup (needs Neo4j)
cargo bench --bench channel_backpressure_bench # Bounded channel overheadLevel 2 — E2E Integration Benchmarks:
# Full pipeline metrics with memory and per-stage timing
./tests/benchmark_e2e.sh --focus rust_e2e --output-dir /tmp/perf_results
# Compare against baseline (fails CI if tolerance exceeded)
scripts/compare_perf_metrics.sh /tmp/perf_results .perf_metrics/baseline.jsonBaseline files: .perf_metrics/baseline.json stores the last known good metrics (committed, updated on main/master merges). Tolerance thresholds in .perf_metrics/threshold_tolerances.json control regression gates (±5% time, ±10% memory by default).
CI Integration: The test-performance job in .github/workflows/ci.yml runs after all E2E correctness tests pass, comparing results against baseline and fails the build on regression.
📜 License
This project is licensed under the MIT License. See LICENSE for details.
🚀 Roadmap
Next Release (v1.1.0 — Performance Optimization) ✅
✅ Neo4j UNWIND Batching (Phase 1-2): Replaced N individual
MERGEqueries with singleUNWIND $entitiesbatch queries — 10-50x speedup on entity/relationship writes✅ Bounded Channels (Phase 3): Parse/embed/res channels bounded with backpressure — peak memory <400MB (was 500MB unbounded)
✅ Concurrent Ingestion (Phase 4): JoinSet + Semaphore for parallel Neo4j/Qdrant writes — 2-3x ingestion throughput
✅ Rayon Thread Pool Config (Phase 5): Configurable
KNOT_RAYON_THREADSenv var (default N-1 cores)✅ Parallel Relationship Resolution (Phase 6):
par_iter_mut()for O(N/num_cpus) resolution✅ Three-Level Benchmarking Framework (Section 9):
Criterion unit benchmarks:
pipeline_bench,graph_upsert_bench,channel_backpressure_benchE2E benchmark script:
tests/benchmark_e2e.shwith metrics captureCI regression tracking:
scripts/compare_perf_metrics.sh+test-performancejob
✅ Memory targets: ~300-400MB peak (well below 2GB nice-to-have, far from 5GB hard limit)
✅ Criterion benchmarks at
benches/| Baseline metrics at.perf_metrics/baseline.json✅ cargo fmt clean | cargo clippy clean | 521 unit tests passing
Current Release (v1.2.0 — Cargo.toml, Config Files, Kubernetes + Helm) ✅
✅ Phase 12A — Cargo.toml Parser: Package metadata, dependencies (simple/table/git/path), features, workspace members via
toml = "0.8"✅ Phase 12B — Configuration Files: YAML (.yml/.yaml), JSON (.json), Java Properties (.properties) with recursive walk, depth limit 10, leaf-key granularity, lock file exclusions, 500KB file size limit. package.json special handling: npm deps as BuildDependency, scripts as ConfigProperty, ProjectIdentity emission
✅ Phase 12C — Kubernetes + Helm: 10 new EntityKind variants (K8sDeployment, K8sService, K8sConfigMap, K8sSecret, K8sIngress, K8sNamespace, K8sResource, HelmChart, HelmValue, HelmTemplateVar). K8s manifest parsing with label/annotation/reference extraction, Helm Chart.yaml/values.yaml/templates support with {{ .Values.X }} variable tracking
✅ 74+ new unit tests across 6 parser modules + 29 E2E tests (6 Cargo + 6 Config + 9 K8s/Helm)
✅ 10/10 E2E test suites pass: JS/TS/Java, Kotlin, Rust, Python, Build Systems (extended), Config Files, K8s/Helm, Groovy, Cross-Language Ref, C/C++
✅ cargo fmt clean | cargo clippy clean | 520 unit tests passing
Previous Release (v0.10.3 — Groovy Private Methods, Nested Closures & UUID Collision Fix) ✅
✅ UUID Collision Fix:
ParsedEntityidentity now includesstart_line✅ Multi-line Method Extraction:
try_extract_typed_method_multilinehandles closure default params✅ Innermost Assignment: method calls in nested closures go to the innermost method
✅ 10 E2E test cases: typed/
def/no-paren callers, multi-line closures, innermost assignment✅ 441 unit tests | clippy clean | fmt applied
Previous Release (v0.8.7 — Enhanced Rust Type Reference Detection in Macros) ✅
✅ Macro Type Reference Extraction: Type references inside macro invocations (
vec![],println!(),assert!(),format!(), etc.) are now correctly captured✅ Intelligent String Filtering: Filters out false positives from string literals using quote-counting heuristics
✅ Comprehensive Edge Case Handling: Validates identifiers, handles nested macros, supports
macro_rules!definitions✅ Improved Accuracy: EntityKind references increased by +95.7% (46→90 references), now captures test function usage
✅ Enhanced Test Coverage: Added 4 new tests for token_tree extraction covering various macro types and edge cases
Previous Release (v0.8.6 — Rust Type Aliases, Constants, and Docstrings) ✅
✅ Rust Type Alias Extraction: Extracts type alias declarations with full signature (e.g.,
type Callback = fn(u32) -> u32)✅ Rust Constant/Static Extraction: Captures
constandstatic mutdeclarations with type signatures✅ Rust Docstring Support: Full doc comment extraction for Rust entities (handles nested
doc_commentnodes in tree-sitter-rust)✅ Rich Vector Embeddings: Type signatures and documentation are now included in embeddings for better semantic search
✅ Improved Search Ranking: Rust entities like
Callbacknow rank in top 5 search results when querying by name
Previous Release (v0.8.5 — Rust Module Refactoring & Clippy Fixes) ✅
✅ Rust Module Refactoring: Extracted Rust parsing logic into dedicated
src/pipeline/parser/languages/rust.rsfor better maintainability and mirroring existing language module architecture.✅ Clippy Compliance: Fixed unused import (
uuid::Uuid) and unnecessarymutwarning in Rust module tests.✅ Rust Support Complete: Phase 8 implementation fully integrated with 17 unit tests and 22 E2E test cases passing.
Previous Release (v0.8.4 — Agent-Skills Documentation Installer & Lightweight Clients) ✅
✅ Dry-Run Mode: MCP server can run in offline mode for quality checks on deployment platforms.
✅ Platform-Agnostic: Removed all platform-specific references; compatible with any deployment platform.
✅ Enhanced Reliability: Graceful handling of missing database connections for validation scenarios.
Previous Release (v0.10.0 — Build Systems & CI/CD Support) ✅
✅ Build Systems Support (Phase 9): Maven
pom.xml(dependencies + plugins via roxmltree), Gradlebuild.gradle(deps + plugins + tasks), and Jenkinsfile pipeline (stages + steps) extraction✅ 22 unit tests + 8 E2E tests (Maven search, pom.xml explore, Gradle dep/task search, Jenkins stage/step search)
✅ BuildDependency, BuildPlugin, BuildTask, PipelineStage, PipelineStep entity kinds with explore_file formatting
Previous Release (v0.9.3 — Python Search Stability & CI Fixes) ✅
✅ Fixed CLI
explore&searchqueries that queried the default collection instead of test collection by appending-r "$REPO_NAME"✅ Python CLI search bug handled; resolved
knot searchqueries failing in specific collection bounds✅ Replaced unreliable
nc -znetwork checks with Neo4j-specific Docker health checks (docker inspect)✅ 426 unit tests | 23 Python E2E | 22 Rust E2E | 10 Kotlin E2E
Earlier Release (v0.8.2 — Quality & Doc Refactor) ✅
✅ MCP Quality: Enhanced tool descriptions for better agent discovery and usage safety.
✅ Token-Efficient Docs: Modularized agent skill guide into
docs/agent-skills/for on-demand loading.✅ Rust Phase 1: Infrastructure prepared for Rust 2024 integration.
✅ Rust Phase 2-5: Complete Rust language support including entity extraction, macro tracking, and comprehensive E2E testing (v0.8.x).
Earlier Release (v0.8.1 — CLI UX & Docker Integration) ✅
✅ Silenced CLI Logs: Default log level set to
errorforknotCLI (cleaner Markdown output).✅ 100% E2E Dual-Testing: All 35 integration tests simultaneously verify both MCP and CLI.
✅ Docker CLI Support: Official Docker image now includes the
knotbinary.✅ Agent Guidance: Enhanced
.knot-agent.mdwith signature-based search warnings.
Phase 7 (v0.8.10 — CLI UX & Corporate Network Support) ✅
✅ Human-friendly output formatting: Colorized table output as default with per-entity-kind ANSI colors
✅ Interactive result navigation: Pager support via
less -R -ewith auto-exit at end of content✅ Configurable output formats:
--outputflag supportstable(default),json, andmarkdown✅ Custom CA Certificates:
--custom-ca-certs/KNOT_CUSTOM_CA_CERTSfor corporate SSL-inspecting proxies✅ O(N) Macro Traversal Optimization (v0.8.11): Substring skipping for deeply nested
token_treenodes
Phase 8 (v0.8.11 — Rust Support) ✅
✅ Support
.rsfiles with tree-sitter-rust parser✅ Struct, enum, union, trait, and impl block extraction
✅ Function, method, macro definition and invocation tracking
✅ Type alias, constant, static, and module extraction with signatures
✅ Docstring extraction for all Rust entity types
✅ O(N) nested macro traversal optimization for large Rust codebases
✅ 17 unit tests for Rust entity and reference extraction
✅ 22 end-to-end integration tests covering all Rust language constructs
Phase 11 (v1.0.0 — C/C++ Support) ✅
✅ Support
.c,.cpp,.cc,.cxx,.h,.hpp,.hh,.hxxfiles via tree-sitter-c and tree-sitter-cpp✅ Intelligent auto-detection of
.hfiles to parse them as C++ if they contain classes, namespaces, or templates✅ Namespace-aware FQN resolution (
Engine::MyClass::start)✅ Class, struct, function, and method extraction with full signatures
✅ Macro definition and usage tracking (uppercase identifier heuristic)
✅ Type reference tracking (declarations,
newexpressions, qualified types)✅ Call graph analysis including method calls, field access (
obj->method()), and scope resolution (std::vector::size())✅ 3 unit tests for C++ entity and reference extraction
✅ 4 end-to-end integration tests covering FQN, call graphs, macro usage, and type references
Upcoming (v1.3.x)
Phase D: Cross-Repo Dependency Linking
Automatic inter-repository call resolution via
:Repositorygraph model withDEPENDS_ONedgesProjectIdentitymarker entity from build files (Maven GAV, Cargo package, npm name)knot depsCLI subcommand +list_repo_dependenciesMCP tool for dependency graph visualizationRetroactive linking for out-of-order indexing
Long-Term Vision
Reestructure E2E test suites to gain velocity by sharing binaries with unit tests and avoid Docker overhead
Markdown documentation indexing
Go support
C# support
IDE plugins (VS Code, IntelliJ, Vim)
Web UI for graph visualization
Language Server Protocol (LSP) integration
Automated Code Review tool (MCP-based)
💬 Questions?
For issues, feature requests, or discussions, please open a GitHub issue.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/raultov/knot'
If you have feedback or need assistance with the MCP directory API, please join our Discord server