knowledge-rag
The knowledge-rag server is a 100% local hybrid search and document management system that integrates with Claude Code via MCP, enabling you to search, manage, and evaluate a personal knowledge base with zero external dependencies.
Search & Retrieval
search_knowledge— Hybrid search combining semantic embeddings (FastEmbed ONNX) and BM25 keyword matching via Reciprocal Rank Fusion, with cross-encoder reranking; tune the balance withhybrid_alpha(0.0 = keyword only, 1.0 = semantic only); filter by category; auto-expands 54 security-term synonyms (e.g., "sqli" → "sql injection")search_similar— Find documents semantically similar to a reference documentget_document— Retrieve full content and metadata for a specific documentResults are diversified via MMR to reduce redundancy; repeat queries benefit from an LRU cache (5-min TTL)
Document Management
add_document— Add a new document from raw text contentupdate_document— Replace and re-index an existing documentremove_document— Remove from index (optionally delete from disk)add_from_url— Fetch a URL, strip HTML, convert to Markdown, and index itSupports 9 formats: Markdown, PDF, DOCX, XLSX, PPTX, CSV, TXT, Python, JSON
Smart chunking: Markdown files split by section headers (##/###); SHA256 deduplication prevents duplicate chunks
Index Management
reindex_documents— Incremental (changed files only), forced smart reindex, or full nuclear rebuild (for model upgrades); auto-reindex via file watcher (5-second debounce)get_index_stats— View total documents, chunks, embedding model, cache hit rate, etc.Auto-detects embedding dimension mismatches and triggers rebuilds when upgrading versions
Organization & Evaluation
list_categories— List all document categories with counts (security, development, ctf, logscale, general, redteam, blueteam, aar, etc.)list_documents— List all indexed documents, optionally filtered by categoryevaluate_retrieval— Benchmark retrieval quality with custom test cases; returns MRR@5 and Recall@5 metrics
All processing is fully local — no API keys, no Ollama, no data leaves your machine.
Knowledge RAG
Your docs, your machine, zero cloud. Claude Code searches them natively.
Drop your PDFs, markdown, code, notebooks — 1800+ files, 39K chunks, indexed in under 3 minutes. Hybrid search (BM25 + semantic vectors + cross-encoder reranking) through 12 MCP tools. Everything runs locally via ONNX. No Docker, no Ollama, no API keys, no data leaves your machine.
pip install knowledge-rag → restart Claude Code → search_knowledge("your query")12 MCP Tools | Hybrid Search + Reranking | 12 File Formats | Optional NVIDIA GPU | 100% Local
What's New | Supported Formats | Installation | Configuration | API Reference | Architecture
What's New in v3.5.2
GPU-Accelerated Embeddings (Optional)
ONNX embeddings can run on NVIDIA GPUs for 5-10x faster indexing. Opt-in — CPU remains the default.
# NVIDIA GPU (requires CUDA 12.x drivers)
pip install knowledge-rag[gpu]
# Also install CUDA 12 runtime libraries (if not using CUDA Toolkit 12.x)
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12 nvidia-cuda-runtime-cu12# config.yaml
models:
embedding:
gpu: true # Automatic CPU fallback if CUDA is unavailableHow it works:
Sets
CUDAExecutionProvideras primary,CPUExecutionProvideras fallbackAuto-discovers CUDA 12 DLLs from pip-installed NVIDIA packages (no manual PATH config)
If GPU init fails for any reason, falls back to CPU silently with a
[WARN]loggpu: false(default) forces CPU-only mode — zero CUDA overhead, clean logs
Ideal for large knowledge bases (1000+ documents) where full rebuilds take minutes on CPU. After the initial index, incremental reindexing (force: true) takes seconds regardless.
Recent Highlights
v3.5.2 — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when
gpu: false), BASE_DIR resolution fix for editable installsv3.5.1 — Remove Python
<3.13upper bound — 3.13 and 3.14 now supportedv3.5.0 — Optional GPU acceleration, supported formats table, full README rewrite
v3.4.3 — MCP stdout save/restore fix (v3.4.2 broke JSON-RPC responses)
v3.4.0 — Persistent model cache, exclude patterns, Jupyter Notebook parser, inotify resilience, MetaTrader support
See Changelog for full history.
Supported Formats
Format | Extension | Parser | Default | Notes |
Markdown |
| Section-aware (splits at | Yes | Headers preserved as chunk boundaries |
Plain Text |
| Fixed-size chunking | Yes | 1000 chars + 200 overlap |
| PyMuPDF extraction | Yes | Text-based PDFs only (no OCR) | |
Python |
| Code-aware parser | Yes | Functions/classes as chunks |
JSON |
| Structure-aware | Yes | Flattened key-value extraction |
CSV |
| Row-based parser | Yes | Headers + rows as text |
Word |
| python-docx | Yes | Headings preserved as markdown |
Excel |
| openpyxl | Yes | Sheet-by-sheet extraction |
PowerPoint |
| python-pptx | Yes | Slide-by-slide extraction |
Jupyter Notebook |
| Cell-aware parser | Yes | Markdown + code cells only, no outputs/base64 |
MQL4 Header |
| Code parser | No | MetaTrader — add to |
MQL4 Source |
| Code parser | No | MetaTrader — add to |
Tip: The parser dispatch is extensible. Any format mapped in
_parserscan be enabled viasupported_formatsin config.yaml.
Features
Feature | Description |
Hybrid Search | Semantic + BM25 keyword search with Reciprocal Rank Fusion |
Cross-Encoder Reranker | Xenova/ms-marco-MiniLM-L-6-v2 re-scores top candidates for precision |
GPU Acceleration | Optional ONNX CUDA support for 5-10x faster indexing |
YAML Configuration | Fully customizable via |
Query Expansion | Configurable synonym mappings (69 security-term defaults) |
Markdown-Aware Chunking |
|
In-Process Embeddings | FastEmbed ONNX Runtime (BAAI/bge-small-en-v1.5, 384D) |
Keyword Routing | Word-boundary aware routing for domain-specific queries |
12 Format Parsers | MD, TXT, PDF, PY, JSON, CSV, DOCX, XLSX, PPTX, IPYNB + opt-in MQH/MQ4 |
Category Organization | Organize docs by folder, auto-tagged by path |
Incremental Indexing | Change detection via mtime/size — only re-indexes modified files |
Chunk Deduplication | SHA256 content hashing prevents duplicate chunks |
Query Cache | LRU cache with 5-min TTL for instant repeat queries |
Document CRUD | Add, update, remove documents via MCP tools |
URL Ingestion | Fetch URLs, strip HTML, convert to markdown, index |
Similarity Search | Find documents similar to a reference document |
Retrieval Evaluation | Built-in MRR@5 and Recall@5 metrics |
File Watcher | Auto-reindex on document changes via watchdog (5s debounce) |
Exclude Patterns | Glob-based file/directory exclusion during indexing |
MMR Diversification | Maximal Marginal Relevance reduces redundant results |
Persistent Model Cache | Embedding models cached in |
Auto-Migration | Detects embedding dimension mismatch and rebuilds automatically |
12 MCP Tools | Full CRUD + search + evaluation via Claude Code |
Architecture
System Overview
flowchart TB
subgraph MCP["MCP SERVER (FastMCP)"]
direction TB
TOOLS["12 MCP Tools<br/>search | get | add | update | remove<br/>reindex | list | stats | url | similar | evaluate"]
end
subgraph SEARCH["HYBRID SEARCH ENGINE"]
direction LR
ROUTER["Keyword Router<br/>(word boundaries)"]
SEMANTIC["Semantic Search<br/>(ChromaDB)"]
BM25["BM25 Keyword<br/>(rank-bm25 + expansion)"]
RRF["Reciprocal Rank<br/>Fusion (RRF)"]
RERANK["Cross-Encoder<br/>Reranker"]
ROUTER --> SEMANTIC
ROUTER --> BM25
SEMANTIC --> RRF
BM25 --> RRF
RRF --> RERANK
end
subgraph STORAGE["STORAGE LAYER"]
direction LR
CHROMA[("ChromaDB<br/>Vector Database")]
COLLECTIONS["Collections<br/>security | ctf<br/>logscale | development"]
CHROMA --- COLLECTIONS
end
subgraph EMBED["EMBEDDINGS (In-Process)"]
FASTEMBED["FastEmbed ONNX<br/>BAAI/bge-small-en-v1.5<br/>(384D, CPU or GPU)"]
CROSSENC["Cross-Encoder<br/>ms-marco-MiniLM-L-6-v2"]
FASTEMBED --- CROSSENC
end
subgraph INGEST["DOCUMENT INGESTION"]
PARSERS["12 Parsers<br/>MD | PDF | TXT | PY | JSON | CSV<br/>DOCX | XLSX | PPTX | IPYNB | MQH | MQ4"]
CHUNKER["Chunking<br/>MD: section-aware<br/>Other: 1000 chars + 200 overlap"]
PARSERS --> CHUNKER
end
CLAUDE["Claude Code"] --> MCP
MCP --> SEARCH
SEARCH --> STORAGE
STORAGE --> EMBED
INGEST --> EMBED
EMBED --> STORAGEQuery Processing Flow
flowchart TB
QUERY["User Query<br/>'mimikatz credential dump'"] --> EXPAND
subgraph EXPANSION["Query Expansion"]
EXPAND["Synonym Expansion<br/>mimikatz -> mimikatz, sekurlsa, logonpasswords"]
end
EXPAND --> ROUTER
subgraph ROUTING["Keyword Routing"]
ROUTER["Keyword Router"]
MATCH{"Word Boundary<br/>Match?"}
CATEGORY["Filter: redteam"]
NOFILTER["No Filter"]
ROUTER --> MATCH
MATCH -->|Yes| CATEGORY
MATCH -->|No| NOFILTER
end
subgraph HYBRID["Hybrid Search"]
direction LR
SEMANTIC["Semantic Search<br/>(ChromaDB embeddings)<br/>Conceptual similarity"]
BM25["BM25 Search<br/>(expanded query)<br/>Exact term matching"]
end
subgraph FUSION["Result Fusion + Reranking"]
RRF["Reciprocal Rank Fusion<br/>score = alpha * 1/(k+rank_sem)<br/>+ (1-alpha) * 1/(k+rank_bm25)"]
RERANK["Cross-Encoder Reranker<br/>Re-scores top 3x candidates<br/>query+doc pair scoring"]
SORT["Sort by Reranker Score<br/>Normalize to 0-1"]
RRF --> RERANK --> SORT
end
CATEGORY --> HYBRID
NOFILTER --> HYBRID
SEMANTIC --> RRF
BM25 --> RRF
SORT --> RESULTS["Results<br/>search_method: hybrid|semantic|keyword<br/>score + reranker_score + raw_rrf_score"]Document Ingestion Flow
flowchart LR
subgraph INPUT["Input"]
FILES["documents/<br/>├── security/<br/>├── development/<br/>├── ctf/<br/>└── general/"]
end
subgraph PARSE["Parse (12 formats)"]
MD["Markdown"]
PDF["PDF<br/>(PyMuPDF)"]
OFFICE["DOCX | XLSX<br/>PPTX | CSV"]
CODE["PY | JSON<br/>IPYNB"]
end
subgraph CHUNK["Chunk"]
MDSPLIT["MD: Section-Aware<br/>Split at ## headers"]
TXTSPLIT["Other: Fixed-Size<br/>1000 chars + 200 overlap"]
DEDUP["SHA256 Dedup<br/>Skip duplicate content"]
end
subgraph EMBED["Embed"]
FASTEMBED["FastEmbed ONNX<br/>bge-small-en-v1.5<br/>(384D, CPU or GPU)"]
end
subgraph STORE["Store"]
CHROMADB[("ChromaDB")]
BM25IDX["BM25 Index"]
end
FILES --> MD & PDF & OFFICE & CODE
MD --> MDSPLIT
PDF & OFFICE & CODE --> TXTSPLIT
MDSPLIT --> DEDUP
TXTSPLIT --> DEDUP
DEDUP --> EMBED
EMBED --> STOREhybrid_alpha Parameter Effect
flowchart LR
subgraph ALPHA["hybrid_alpha values"]
A0["0.0<br/>Pure BM25<br/>Instant"]
A3["0.3 (default)<br/>Keyword-heavy<br/>Fast"]
A5["0.5<br/>Balanced"]
A7["0.7<br/>Semantic-heavy"]
A10["1.0<br/>Pure Semantic"]
end
subgraph USE["Best For"]
U0["CVEs, tool names<br/>exact matches"]
U3["Technical queries<br/>specific terms"]
U5["General queries"]
U7["Conceptual queries<br/>related topics"]
U10["'How to...' questions<br/>conceptual search"]
end
A0 --- U0
A3 --- U3
A5 --- U5
A7 --- U7
A10 --- U10Installation
Prerequisites
Python 3.11+
Claude Code CLI
~200MB disk for model cache (auto-downloaded on first run)
Optional: NVIDIA GPU + CUDA for accelerated embeddings (
pip install knowledge-rag[gpu])
Quick Start (3 steps)
Step 1: Install
# Option A: One-line installer (recommended)
# Linux/macOS:
curl -fsSL https://raw.githubusercontent.com/lyonzin/knowledge-rag/master/install.sh | bash
# Windows (PowerShell):
irm https://raw.githubusercontent.com/lyonzin/knowledge-rag/master/install.ps1 | iex
# Option B: pip install (manual)
mkdir ~/knowledge-rag && cd ~/knowledge-rag
python3 -m venv venv && source venv/bin/activate
pip install knowledge-rag
knowledge-rag init # Exports config template, presets, creates documents/
# Option C: Clone from source
git clone https://github.com/lyonzin/knowledge-rag.git ~/knowledge-rag
cd ~/knowledge-rag
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txtWindows users: Use
pythoninstead ofpython3,venv\Scripts\activateinstead ofsource venv/bin/activate.
Step 2: Configure Claude Code
claude mcp add knowledge-rag -s user -- ~/knowledge-rag/venv/bin/python -m mcp_server.serverWindows:
claude mcp add knowledge-rag -s user -- %USERPROFILE%\knowledge-rag\venv\Scripts\python.exe -m mcp_server.server
The server auto-detects the project directory from the venv location. No cd wrapper or cwd field needed.
Add to ~/.claude.json:
Windows:
{
"mcpServers": {
"knowledge-rag": {
"command": "C:\\Users\\YOUR_USER\\knowledge-rag\\venv\\Scripts\\python.exe",
"args": ["-m", "mcp_server.server"]
}
}
}Linux / macOS:
{
"mcpServers": {
"knowledge-rag": {
"command": "/home/YOUR_USER/knowledge-rag/venv/bin/python",
"args": ["-m", "mcp_server.server"]
}
}
}Replace
YOUR_USERwith your username, or use the full path fromecho $HOME.
Step 3: Restart Claude Code
# Verify the server is connected
claude mcp listOn first start, the server will:
Download the embedding model (~50MB, cached in
models_cache/)Auto-index any documents in the
documents/directoryStart watching for file changes (auto-reindex)
Usage
Adding Documents
Place your documents in the documents/ directory, organized by category:
documents/
├── security/ # Pentest, exploit, vulnerability docs
├── development/ # Code, APIs, frameworks
├── ctf/ # CTF writeups and methodology
├── logscale/ # LogScale/LQL documentation
└── general/ # Everything elseOr add documents programmatically via MCP tools:
# Add from content
add_document(
content="# My Document\n\nContent here...",
filepath="security/my-technique.md",
category="security"
)
# Add from URL
add_from_url(
url="https://example.com/article",
category="security",
title="Custom Title"
)Searching
Claude uses the RAG system automatically when configured. You can also control search behavior:
# Pure keyword search — instant, no embedding needed
search_knowledge("gtfobins suid", hybrid_alpha=0.0)
# Keyword-heavy (default) — fast, slight semantic boost
search_knowledge("mimikatz", hybrid_alpha=0.3)
# Balanced hybrid — both engines equally weighted
search_knowledge("SQL injection techniques", hybrid_alpha=0.5)
# Semantic-heavy — better for conceptual queries
search_knowledge("how to escalate privileges", hybrid_alpha=0.7)
# Pure semantic — embedding similarity only
search_knowledge("lateral movement strategies", hybrid_alpha=1.0)Indexing
Documents are automatically indexed on first startup. To manage the index:
# Incremental: only re-index changed files (fast)
reindex_documents()
# Smart reindex: detect changes + rebuild BM25
reindex_documents(force=True)
# Nuclear rebuild: delete everything, re-embed all (use after model change)
reindex_documents(full_rebuild=True)Evaluating Retrieval Quality
evaluate_retrieval(test_cases='[
{"query": "sql injection", "expected_filepath": "security/sqli-guide.md"},
{"query": "privilege escalation", "expected_filepath": "security/privesc.md"}
]')
# Returns: MRR@5, Recall@5, per-query resultsAPI Reference
Search & Query
search_knowledge
Hybrid search combining semantic search + BM25 keyword search with cross-encoder reranking.
Parameter | Type | Default | Description |
| string | required | Search query text (1-3 keywords recommended) |
| int | 5 | Maximum results to return (1-20) |
| string | null | Filter by category |
| float | 0.3 | Balance: 0.0 = keyword only, 1.0 = semantic only |
Returns:
{
"status": "success",
"query": "mimikatz credential dump",
"hybrid_alpha": 0.5,
"result_count": 3,
"cache_hit_rate": "0.0%",
"results": [
{
"content": "Mimikatz can extract credentials from memory...",
"source": "documents/security/credential-attacks.md",
"filename": "credential-attacks.md",
"category": "security",
"score": 0.9823,
"raw_rrf_score": 0.016393,
"reranker_score": 0.987654,
"semantic_rank": 2,
"bm25_rank": 1,
"search_method": "hybrid",
"keywords": ["mimikatz", "credential", "lsass"],
"routed_by": "redteam"
}
]
}Search Method Values:
hybrid: Found by both semantic and BM25 search (highest confidence)semantic: Found only by semantic searchkeyword: Found only by BM25 keyword search
get_document
Retrieve the full content of a specific document.
Parameter | Type | Description |
| string | Path to the document file |
Returns: JSON with document content, metadata, keywords, and chunk count.
reindex_documents
Index or reindex all documents in the knowledge base.
Parameter | Type | Default | Description |
| bool | false | Smart reindex: detects changes, rebuilds BM25. Fast. |
| bool | false | Nuclear rebuild: deletes everything, re-embeds all documents. Use after model change. |
Returns: JSON with indexing statistics (indexed, updated, skipped, deleted, chunks_added, chunks_removed, dedup_skipped, elapsed_seconds).
list_categories
List all document categories with their document counts.
Returns:
{
"status": "success",
"categories": {
"security": 52,
"development": 8,
"ctf": 12,
"general": 3
},
"total_documents": 75
}list_documents
List all indexed documents, optionally filtered by category.
Parameter | Type | Description |
| string | Optional category filter |
Returns: JSON array of documents with id, source, category, format, chunks, and keywords.
get_index_stats
Get statistics about the knowledge base index.
Returns:
{
"status": "success",
"stats": {
"total_documents": 75,
"total_chunks": 9256,
"unique_content_hashes": 9100,
"categories": {"security": 52, "development": 8},
"supported_formats": [".md", ".txt", ".pdf", ".py", ".json", ".docx", ".xlsx", ".pptx", ".csv", ".ipynb"],
"embedding_model": "BAAI/bge-small-en-v1.5",
"embedding_dim": 384,
"reranker_model": "Xenova/ms-marco-MiniLM-L-6-v2",
"chunk_size": 1000,
"chunk_overlap": 200,
"query_cache": {
"size": 12,
"max_size": 100,
"ttl_seconds": 300,
"hits": 45,
"misses": 23,
"hit_rate": "66.2%"
}
}
}Document Management
add_document
Add a new document to the knowledge base from raw content. Saves the file to the documents directory and indexes it immediately.
Parameter | Type | Default | Description |
| string | required | Full text content of the document |
| string | required | Relative path within documents dir (e.g., |
| string | "general" | Document category |
update_document
Update an existing document. Removes old chunks from the index and re-indexes with new content.
Parameter | Type | Description |
| string | Full path to the document file |
| string | New content for the document |
remove_document
Remove a document from the knowledge base index. Optionally deletes the file from disk.
Parameter | Type | Default | Description |
| string | required | Path to the document file |
| bool | false | If true, also delete the file from disk |
add_from_url
Fetch content from a URL, strip HTML (scripts, styles, nav, footer, header), convert to markdown, and add to the knowledge base.
Parameter | Type | Default | Description |
| string | required | URL to fetch content from |
| string | "general" | Document category |
| string | null | Custom title (auto-detected from |
search_similar
Find documents similar to a given document using embedding similarity.
Parameter | Type | Default | Description |
| string | required | Path to the reference document |
| int | 5 | Number of similar documents to return (1-20) |
evaluate_retrieval
Evaluate retrieval quality with test queries. Useful for tuning hybrid_alpha, testing query expansion effectiveness, or validating after reindexing.
Parameter | Type | Description |
| string (JSON) | Array of test cases: |
Metrics:
MRR@5 (Mean Reciprocal Rank): Average of 1/rank for expected documents. 1.0 = always first result.
Recall@5: Fraction of expected documents found in top 5 results. 1.0 = all found.
Configuration
Knowledge RAG is fully configurable via a config.yaml file in the project root. If no config.yaml exists, sensible defaults are used — the system works out of the box with zero configuration.
Quick Start
# Option 1: Use a preset
cp presets/cybersecurity.yaml config.yaml # Offensive/defensive security, CTFs
cp presets/developer.yaml config.yaml # Software engineering, APIs, DevOps
cp presets/research.yaml config.yaml # Academic research, papers, studies
cp presets/general.yaml config.yaml # Blank slate, pure semantic search
# Option 2: Start from the documented template
cp config.example.yaml config.yaml
# Edit config.yaml to your needsRestart Claude Code after changing config.yaml.
config.yaml Structure
# Paths — where your documents live
paths:
documents_dir: "./documents" # Scanned recursively
data_dir: "./data" # Index storage
models_cache_dir: "./models_cache" # Persistent embedding model cache
# Documents — what gets indexed and how
documents:
supported_formats: # File types to index
- .md
- .txt
- .pdf
- .docx
- .ipynb
# - .py # Uncomment to index code
exclude_patterns: # Glob patterns to skip
- "node_modules"
- ".venv"
- "__pycache__"
chunking:
chunk_size: 1000 # Max chars per chunk
chunk_overlap: 200 # Shared chars between chunks
# Models — AI models for search (all run locally, no API keys)
models:
embedding:
model: "BAAI/bge-small-en-v1.5" # ONNX, ~33MB, auto-downloaded
dimensions: 384
gpu: false # Set true + pip install knowledge-rag[gpu]
reranker:
enabled: true # Set false on low-resource machines
model: "Xenova/ms-marco-MiniLM-L-6-v2"
top_k_multiplier: 3 # Candidates fetched before reranking
# Search — result limits and collection name
search:
default_results: 5
max_results: 20
collection_name: "knowledge_base" # Change for separate knowledge bases
# Categories — auto-tag documents by folder path
# Set to {} to disable categorization entirely
category_mappings:
"security/redteam": "redteam"
"security/blueteam": "blueteam"
"notes": "notes"
# Keyword routing — prioritize categories based on query keywords
# Set to {} for pure semantic search with no routing bias
keyword_routes:
redteam:
- pentest
- exploit
- privilege escalation
# Query expansion — expand abbreviations for better BM25 recall
# Set to {} for no expansion (search terms used as-is)
query_expansions:
sqli:
- sql injection
- sqli
privesc:
- privilege escalation
- privescSee
config.example.yamlfor the fully documented template with explanations for every field.
Presets
Pre-built configurations for common use cases:
Preset | File | Categories | Keywords | Expansions | Best For |
Cybersecurity |
| 8 | 200+ | 69 | Red/Blue Team, CTFs, threat hunting, exploit dev |
Developer |
| 9 | 150+ | 50+ | Full-stack dev, APIs, DevOps, cloud, databases |
Research |
| 9 | 100+ | 40+ | Academic papers, thesis, lab notebooks, datasets |
General |
| 0 | 0 | 0 | Blank slate — pure semantic search, no domain logic |
Creating your own preset: Copy config.example.yaml, fill in your categories/keywords/expansions, save to presets/your-domain.yaml.
Configuration Reference
Paths
Field | Default | Description |
|
| Root folder scanned recursively for documents |
|
| Internal storage for ChromaDB and index metadata |
|
| Persistent cache for embedding models (~250MB). Survives reboots |
Relative paths resolve from the project root. Absolute paths work too.
Documents
Field | Default | Description |
| .md .txt .pdf .py .json .docx .xlsx .pptx .csv .ipynb | File extensions to index |
|
| Glob patterns for files/dirs to skip during indexing |
| 1000 | Max characters per chunk |
| 200 | Characters shared between consecutive chunks |
Chunking guidelines: Short notes → 500/100. General use → 1000/200. Long technical docs → 1500/300.
For .md files, chunking splits at ## and ### header boundaries first. Sections larger than chunk_size are sub-chunked with overlap. Non-markdown files use fixed-size chunking.
Models
Field | Default | Description |
|
| Embedding model (ONNX, runs locally) |
| 384 | Vector dimensions (must match model) |
| false | Enable CUDA GPU acceleration. Requires |
| true | Enable cross-encoder reranking |
|
| Reranker model |
| 3 | Fetch N*multiplier candidates for reranking |
Embedding model options (fastest → most accurate):
BAAI/bge-small-en-v1.5— 384D, ~33MB (default)BAAI/bge-base-en-v1.5— 768D, ~130MBBAAI/bge-large-en-v1.5— 1024D, ~335MBintfloat/multilingual-e5-small— 384D, 100+ languages
Warning: Changing the embedding model after indexing requires
reindex_documents(full_rebuild=True).
Search
Field | Default | Description |
| 5 | Results returned when no limit specified |
| 20 | Hard cap even if client requests more |
|
| ChromaDB collection — change for separate KBs |
Categories
Map folder paths to category names. Documents in matching folders get auto-tagged, enabling filtered searches.
category_mappings:
"security/redteam": "redteam"
"security": "security"Set category_mappings: {} to disable — documents are still searchable, just without category filters.
Keyword Routing
Route queries to categories based on keywords. When a query contains listed keywords, results from that category are prioritized (not filtered — other categories still appear, ranked lower).
keyword_routes:
redteam:
- pentest
- exploit
- sqliSingle-word keywords use regex word boundaries (\b) — "api" won't match "RAPID". Multi-word keywords use substring matching.
Set keyword_routes: {} for pure semantic search.
Query Expansion
Expand search terms with synonyms before BM25 search. Supports single tokens, bigrams, and full query matches.
query_expansions:
sqli:
- sql injection
- sqli
k8s:
- kubernetes
- k8sSet query_expansions: {} for no expansion.
Hybrid Search Tuning
hybrid_alpha | Behavior | Best For |
0.0 | Pure BM25 keyword | Exact terms, CVEs, tool names |
0.3 | Keyword-heavy (default) | Technical queries with specific terms |
0.5 | Balanced | General queries |
0.7 | Semantic-heavy | Conceptual queries, related topics |
1.0 | Pure semantic | "How to..." questions, abstract concepts |
Project Structure
knowledge-rag/
├── mcp_server/
│ ├── __init__.py # Stdout protection + version
│ ├── config.py # YAML config loader + defaults
│ ├── ingestion.py # 12 parsers, chunking, metadata extraction
│ └── server.py # MCP server, ChromaDB, BM25, reranker, 12 tools
├── config.example.yaml # Documented config template (copy to config.yaml)
├── config.yaml # Your active configuration (git-ignored)
├── presets/ # Ready-to-use domain configurations
│ ├── cybersecurity.yaml
│ ├── developer.yaml
│ ├── research.yaml
│ └── general.yaml
├── documents/ # Your documents (scanned recursively)
├── data/
│ ├── chroma_db/ # ChromaDB vector database
│ └── index_metadata.json # Incremental indexing state
├── models_cache/ # Persistent embedding model cache
├── tests/ # Test suite (82 tests)
├── install.sh # Linux/macOS installer
├── install.ps1 # Windows installer
├── venv/ # Python virtual environment
├── requirements.txt
├── pyproject.toml
├── LICENSE
└── README.mdTroubleshooting
Python version mismatch
Requires Python 3.11 or newer.
python --version # Must be 3.11+FastEmbed model download fails
On first run, FastEmbed downloads models to models_cache/. If the download fails:
# Clear cache and retry
# Windows:
rmdir /s /q models_cache
# Linux/macOS:
rm -rf models_cache
# Then restart the MCP serverIndex is empty
# Check documents directory has files
ls documents/
# Force reindex via Claude Code:
# reindex_documents(force=True)
# Or nuclear rebuild if model changed:
# reindex_documents(full_rebuild=True)MCP server not loading
Check
~/.claude.jsonexists and has valid JSON in themcpServerssectionVerify paths use double backslashes (
\\) on WindowsRestart Claude Code completely
Run
claude mcp listto check connection status
"Failed to connect" error
The MCP server uses stdout for JSON-RPC communication. If a library prints to stdout during init, the stream gets corrupted. v3.4.3+ includes stdout protection that prevents this. If you're on an older version, upgrade:
pip install --upgrade knowledge-ragSlow first query
The cross-encoder reranker model is lazy-loaded on the first query. This adds a one-time ~2-3 second delay for model download and loading. Subsequent queries are fast.
Memory usage
With ~200 documents, expect ~300-500MB RAM. The embedding model (~50MB) and reranker (~25MB) are loaded into memory. For very large knowledge bases (1000+ documents), consider enabling GPU acceleration and using exclude patterns to limit index scope.
Changelog
v3.5.2 (2026-04-16)
NEW: Auto-discovery of CUDA 12 DLLs from pip-installed NVIDIA packages — no manual PATH configuration needed
NEW: Graceful GPU→CPU fallback with
[WARN]log when CUDA init fails (missing drivers, wrong version, etc.)FIX: Explicit
CPUExecutionProviderwhengpu: false— eliminates noisy CUDA probe errors in logsFIX: BASE_DIR resolution now correctly prefers directories with
config.yamlover those with onlyconfig.example.yaml(fixes editable installs)
v3.5.1 (2026-04-16)
FIX: Removed Python upper bound constraint (
<3.13→>=3.11). Python 3.13 and 3.14 now supported — onnxruntime ships wheels for both.
v3.5.0 (2026-04-16)
NEW: Optional GPU acceleration for ONNX embeddings —
pip install knowledge-rag[gpu]+models.embedding.gpu: truein config. 5-10x faster indexing on NVIDIA GPUs with automatic CPU fallback.DOCS: Supported formats table added to README (12 formats)
v3.4.3 (2026-04-16)
FIX: Correct stdout protection via save/restore pattern —
__init__.pysaves original stdout and redirects to stderr during init,server.py main()restores it beforemcp.run(). v3.4.2's global redirect broke MCP JSON-RPC response channel.
v3.4.1 (2026-04-16)
FIX:
pip install knowledge-ragnow auto-detects project directory from venv locationNEW:
install.sh— Linux/macOS installer with pip and from-source modesIMPROVED: BASE_DIR resolution chain: env var → source dir → venv parent → CWD → fallback
v3.4.0 (2026-04-16)
NEW:
models_cache_dir— persistent embedding model cache, prevents re-download after rebootsNEW:
exclude_patterns— glob-based file/directory exclusion during indexingNEW: Jupyter Notebook (.ipynb) parser — extracts markdown and code cell sources only
NEW: MCP stdout protection — redirects stdout to stderr before server start
NEW: File watcher resilience — graceful fallback when Linux inotify limits are reached
NEW: MetaTrader (.mq4, .mqh) support — opt-in code parsing
NEW: 23 new tests (exclude patterns, ipynb parser, stdout protection)
v3.3.x
v3.3.2: Full type validation on YAML config, bounds checking, version sync
v3.3.1: YAML null value crash fix, presets bundled in pip wheel,
knowledge-rag initCLIv3.3.0: YAML configuration system, 4 domain presets, generic use support
v3.2.x
v3.2.4: Symlink support with circular loop protection
v3.2.3: BASE_DIR smart detection for pip installs
v3.2.2: Plug-and-play pip install,
KNOWLEDGE_RAG_DIRenv varv3.2.1: Auto-recovery from corrupted ChromaDB
v3.2.0: Parallel BM25 + Semantic search, adjacent chunk retrieval
v3.1.x
v3.1.1: Code block protection in markdown chunker, AAR category, 14 CVE aliases
v3.1.0: DOCX/XLSX/PPTX/CSV support, file watcher, MMR diversification, PyPI publish
v3.0.0 (2026-03-19)
Replaced Ollama with FastEmbed (ONNX in-process)
Cross-encoder reranking, markdown-aware chunking, query expansion
6 new MCP tools (12 total), auto-migration from v2.x
v2.2.0:
hybrid_alpha=0skips Ollama, default changed from 0.5 to 0.3v2.1.0: Mermaid architecture diagrams
v2.0.0: Hybrid search, RRF fusion,
hybrid_alphaparameterv1.1.0: Incremental indexing, query cache, chunk deduplication
v1.0.1: Auto-cleanup orphan folders, removed hardcoded paths
v1.0.0: Initial release
Contributing
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature)Commit your changes
Push to the branch (
git push origin feature/amazing-feature)Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
ChromaDB — Vector database
FastEmbed — ONNX Runtime embeddings
FastMCP — Model Context Protocol framework
PyMuPDF — PDF parsing
rank-bm25 — BM25 Okapi implementation
Watchdog — File system monitoring
python-docx / openpyxl / python-pptx — Office document parsing
PyYAML — YAML configuration parsing
Beautiful Soup — HTML parsing for URL ingestion
Author
Lyon.
Security Researcher | Developer
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/lyonzin/knowledge-rag'
If you have feedback or need assistance with the MCP directory API, please join our Discord server