local-rag-core
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@local-rag-coreSearch for how to handle background tasks in FastAPI"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
local-rag-core
Headless local knowledge library and RAG substrate. It is designed to act as a secure, local, structure-preserving data layer for coding assistants (like Claude Code, Codex, and OpenCode), browsers, translators, and agentic workflows.
1. Core Principles
Headless & UI-less: This is not a chat application. It manages file ingestion, document chunking, indexing, hybrid retrieval, reranking, and source attribution.
Strict logical separation:
private_kb: User's long-term private notes and documentation.packs: Modular, shareable documentation packages that can be registered, enabled, or disabled.private_project: Local source code repositories with filters protecting sensitive data.
Always Attributed: All retrieved chunks specify their
source_type,pack_id/project_id, chunk ID, file path, section context, and normalized relevance scores.Read-Only MCP Server: The MCP tool interface is strictly read-only (
kb.search,kb.get_chunk,kb.list_packs,kb.health). Data modifications must occur via the CLI or HTTP API.
Related MCP server: ragi
2. Technology Stack
Python:
>=3.10SQLite / FTS5: Relational registries, metadata index, and BM25 full-text search.
Zvec: Embedded in-process vector database for semantic indexing.
MCP: Low-level
mcp.server.Serverstdio server; read-only by design.FastAPI & Uvicorn: High-performance HTTP microservice.
Typer: Type-safe CLI builder.
3. Installation
Minimal Install
Core engine with CLI, SQLite FTS5 search, and mock fallbacks (no ML deps):
pip install -e .Development Install
Core + testing, linting, API/HTTP test deps:
pip install -e ".[dev]"This is sufficient for running the full test suite on a clean machine. Tests that require optional packages (sentence-transformers, zvec, mcp) will be skipped automatically.
For day-to-day development, use the fast lane:
pytest -m "not real_model"Use the full real-backend lane before a maintenance handoff:
pytest
rag health --strict
rag integrity --deep
python scripts/run_eval.py
python scripts/ops_verify.pyscripts/run_eval.py prewarms the retriever by default and reports the warmup
latency separately from per-query latency. Use --no-warmup when you explicitly
want to measure process/model cold start.
Full Install
All backends and interfaces — real embeddings, vector store, reranker, MCP, HTTP API:
pip install -e ".[all]"Selective Install
Install only the extras you need:
# MCP Server (stdio-based, for Claude Code / Codex / Gemini)
pip install -e ".[mcp]"
# HTTP FastAPI Server
pip install -e ".[api]"
# Real neural embeddings (BAAI/bge-m3 via sentence-transformers)
pip install -e ".[embedding]"
# Real cross-encoder reranking (BAAI/bge-reranker-large via CrossEncoder)
pip install -e ".[rerank]"
# Native Zvec vector index
pip install -e ".[vector]"
# ModelScope (MotSE) model download source
pip install -e ".[modelscope]"4. Interfaces & Usage
4.1 CLI Interface (rag and rag-project)
System Health Check
rag healthPack Registry Management
# List all registered packs and code projects
rag pack list
# Enable or disable specific packs
rag pack enable fastapi_docs
rag pack disable python_docs
# Build, export, and import packs
rag pack build my_pack /path/to/docs --name "My Documentation Pack" --domain "tech"
rag pack export my_pack --output /path/to/my_pack.tar.gz
rag pack import /path/to/my_pack.tar.gzGeneral Document Ingestion
rag ingest <file-or-dir> --pack <id> --source-type <private|pack>Code Project Ingestion (Standalone or Subcommand)
Code ingestion supports common extensions (.py, .js, .ts, .go, .rs, etc.) and respects exclusions (.git, node_modules, .env).
# Using subcommand
rag project ingest /path/to/my-code --project my-app-id
# Using standalone tool
rag-project /path/to/my-code --project my-app-idSearch
rag search "how to handle background tasks" --limit 5 --mode hybrid --rerank bgeNotes on scope:
If no
--packsscope is given, the engine searches all enabled packs.--packs pack1,pack2restricts results to those packs plus any user private notes (source_type='private'); otherprivate_projectpacks are not automatically included.Add
--no-privateto exclude user private notes while still honoring the listed packs (useful for targeting a single project).
Retrieve Chunk Details
rag chunk get <chunk_id>4.2 HTTP API Interface
Configure a write token and the filesystem roots that HTTP write operations may
access. On Windows, separate multiple roots with ;; on Unix, use :.
$env:LOCAL_RAG_API_TOKEN = "replace-with-a-long-random-token"
$env:LOCAL_RAG_ALLOWED_ROOTS = "C:\Users\Zero\Documents;D:\Knowledge"Run the FastAPI microservice:
uvicorn local_rag_core.interfaces.api:app --host 127.0.0.1 --port 8000All mutating endpoints require the token in the
X-Local-RAG-Token request header. Path-based write endpoints reject paths
outside LOCAL_RAG_ALLOWED_ROOTS. Read-only health, pack listing, search, and
chunk retrieval endpoints do not require the token.
Endpoints Summary
GET /health: Returns database health check and chunk count metadata.GET /packs: Returns registered documentation packages and projects list.POST /packs/{pack_id}/enable: Enables a registered pack.POST /packs/{pack_id}/disable: Disables a registered pack.GET /chunks/{chunk_id}: Retrieves full text content and metadata of a specific chunk.POST /project/ingest: Ingests a local project repository.Body:
{"path": "/absolute/path", "project": "project_id"}
POST /packs/build: Compiles a documentation source directory into a self-contained pack.Body:
{"pack_id": "my_pack_id", "source_dir": "/path/to/docs", "name": "My Pack", "domain": "tech", "description": "desc"}
POST /packs/export: Bundles an installed pack directory into a tarball archive.Body:
{"pack_id": "my_pack_id", "output_path": "/path/to/my_pack_id.tar.gz"}
POST /packs/import: Decompresses and registers a pack archive into the local library.Body:
{"archive_path": "/path/to/my_pack_id.tar.gz"}
POST /search: Queries the RAG substrate using keyword, semantic, or hybrid configurations.Body:
{ "query": "FastAPI background workers", "limit": 8, "mode": "hybrid", "scope_private": true, "scope_packs": ["fastapi_docs"], "rerank": "bge" }
4.3 MCP Tool Interface
The MCP server connects local-rag-core directly to LLM clients (like Claude Desktop or Claude Code) using stdio.
Run the MCP server directly:
C:\Users\Zero\AppData\Local\hermes\hermes-agent\venv\Scripts\python.exe -m local_rag_core.interfaces.mcp_serverUse a Python runtime that can import both local_rag_core and the mcp SDK.
On a migrated machine, update the AI-tool MCP configs to that machine's
verified Python path.
Registered MCP Tools
kb.health: Checks database health and indexes (No arguments).kb.list_packs: Lists registered packages/projects (No arguments).kb.get_chunk: Fetches the full contents of a single chunk (Args:chunk_id).kb.search: Performs semantic, keyword, or hybrid query search.Args:
query(str),limit(int),mode(str),no_private(bool),scope_packs(List[str]),rerank(str).
Claude Desktop Configuration
Add this to your claude_desktop_config.json:
{
"mcpServers": {
"local-rag-core": {
"command": "python",
"args": [
"-m",
"local_rag_core.interfaces.mcp_server"
]
}
}
}5. Security & Exclusions
In order to prevent indexing credentials or build outputs, the scanner ignores files matching patterns defined in should_ignore():
Directories:
.git,.venv,venv,node_modules,build,dist,__pycache__Files:
.env,.pem,.key,registry.sqlite,.log
Pack IDs are restricted to ASCII letters, digits, dots, underscores, and hyphens. Pack archives reject absolute paths, traversal entries, links, special files, excessive member counts, and excessive extracted sizes.
6. Embedding Backend Modes
local-rag-core supports two embedding backends, controlled by the
LOCAL_RAG_EMBEDDING_BACKEND environment variable.
mock (default when sentence-transformers is not installed)
A deterministic word-overlap hash-based embedding generator. Each word is hashed to a pseudo-random unit vector; the document vector is the L2-normalized sum of its word vectors.
Purpose: structural testing, CI, and lightweight development.
Does NOT provide true semantic retrieval.
Forced by:
LOCAL_RAG_EMBEDDING_BACKEND=mock
bge-m3 (requires sentence-transformers)
Uses the BAAI/bge-m3 model loaded via sentence-transformers. This is the
real semantic embedding backend.
Install:
pip install -e ".[embedding]"(add,modelscopeto prioritize ModelScope downloads)Model:
BAAI/bge-m3(overridable viaEMBEDDING_MODEL)Cache: HuggingFace default cache (
~/.cache/huggingface/hub/) orLOCAL_RAG_MODEL_CACHEif set. ModelScope uses amodelscope/subdir under the same cache root when selected.Offline by default: set
ALLOW_MODEL_DOWNLOAD=trueto allow downloading the model on first use.Forced by:
LOCAL_RAG_EMBEDDING_BACKEND=bge-m3
Model Download Sources
By default (LOCAL_RAG_MODEL_SOURCE=auto), local-rag-core tries to download
models from ModelScope (魔搭) first when the modelscope package is
installed, and falls back to HuggingFace Hub otherwise. Model IDs are identical
on both platforms (BAAI/bge-m3, BAAI/bge-reranker-large).
Source | Behavior |
| ModelScope first if installed, else HuggingFace |
| Force ModelScope; error if |
| Force HuggingFace Hub |
Set USE_MODELSCOPE=true as a shorthand for LOCAL_RAG_MODEL_SOURCE=modelscope.
If both are set, LOCAL_RAG_MODEL_SOURCE wins.
auto (default)
When LOCAL_RAG_EMBEDDING_BACKEND is unset or set to auto, the system
selects bge-m3 if sentence-transformers is importable, otherwise falls
back to mock.
Configuration Reference
Env Var | Values | Default | Effect |
|
|
| Selects embedding backend |
|
|
| Primary model download source |
|
|
| Shorthand for |
| any path | (HF default) | Custom model cache directory |
| HF/MS model name |
| Which model to load |
|
|
| Allow first-time model download |
|
|
| Torch device |
Health Output
$ rag health
Embedding backend: mock # sentence-transformers not installed, or forced
Embedding backend: bge-m3 # sentence-transformers available and model loaded
Model source: auto # auto / modelscope / huggingface
ModelScope available: false # true when modelscope package is installed
Registered packs: 22
Indexed chunks: 69341
Disabled packs: 2
Pack status counts: {"disabled": 2, "enabled": 20}
Pack source type counts: {"pack": 12, "private_project": 6, "source_code_index": 4}
Chunk source type counts: {"pack": 58437, "private_project": 10841, "source_code_index": 63}
Latest audit action: ingest_path7. Mock / Fallback Modes
To support light-weight local development and testing, local-rag-core
provides built-in fallback modes if machine learning dependencies or
databases are missing:
Mock Embedding: See §6 above for full details.
Simple Flat Vector Store: If the
zvecbinary extension is not installed, the engine usesSimpleFlatVectorStore, a pure-Python in-process cosine similarity engine storing vectors in JSON files.Mock Reranking: If
sentence-transformers/CrossEncodercannot be used, the reranker falls back to a simple query-document word-overlap heuristic.Verification: Always run
rag health --strictto inspect which backends are active (mockvs. real). Production readiness requires installing the corresponding extras (embedding,rerank,vector) and locally cached models.
Embedding model loading is offline by default. Even when
sentence-transformers is installed, only locally cached model files are used.
Set ALLOW_MODEL_DOWNLOAD=true only when a model download has been explicitly
approved.
8. Knowledge Governance
Long-term use of local-rag-core depends on clear boundaries between different
kinds of knowledge. See KNOWLEDGE_GOVERNANCE.md for
the full policy. The summary below describes the source types and basic rules.
Source Types
| Purpose | Example Content |
| Documentation of an active project you maintain. |
|
| Reusable, shareable documentation package. | Tutorials, framework guides, methodology docs |
| Translation terminology, profiles, and history. | Glossaries, profiles, translation memory |
| Web pages explicitly saved by the user. | Curated web articles, reference pages |
| Temporary context from the current web page. | Current page summary, used once and discarded |
| Indexed source code (disabled by default). |
|
| Temporary experimental material. | Drafts, quick tests, one-off notes |
Pack ID Conventions
Active project docs:
<project_name>_projectExample:
local_rag_core_project,local_llm_pipeline_project
Reusable doc package:
<topic>_packExample:
fastapi_docs_pack
Translation assets:
translator_packSaved web pages:
browser_saved
Ingestion Rules
Project docs go into
private_project.Reusable tutorials / frameworks go into
pack.Translation assets go into
translator_pack.Web pages are temporary by default; manual save is required for
browser_saved.Source code indexing is explicit-scope only. Curated source indexes use
source_code_indexand should be queried intentionally rather than mixed into broad documentation retrieval by default.MCP tools are read-only. Ingestion, export, import, delete, and reindex must use the CLI or authenticated HTTP API.
Exclusions
The scanner ignores:
.git/,.venv/,venv/,__pycache__/node_modules/,dist/,build/storage/,data/packs/.env,.pem,.key,.logLarge binaries and generated lock files unless explicitly required
9. Verification & Testing Commands
To run basic checks, test suites, and inspect RAG health:
# 1. Install development tools and run test suite
pip install -e ".[dev]"
pytest
ruff check .
# 2. Run system health check and verify status
rag health
# 3. Test pack building and list packages
rag pack list
rag pack build my_docs_pack ./docs --name "Documentation Pack"
rag pack list
# 4. Ingest and query
rag ingest README.md --pack readme_pack --source-type pack
rag search "automatic query routing" --limit 3 --mode keyword
rag search "local knowledge library for code assistants" --limit 3 --mode hybrid
# 5. Verify MCP Tool interface (if mcp extra installed)
C:\Users\Zero\AppData\Local\hermes\hermes-agent\venv\Scripts\python.exe -m local_rag_core.interfaces.mcp_server
# 6. Verify AI-tool entrypoint readiness
# Checks Codex, Claude Code, Gemini CLI, OpenCode config, and stdio launch.
PYTHONIOENCODING=utf-8 python scripts/ops_verify.pyThis server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/linkwut-create/local-rag-core'
If you have feedback or need assistance with the MCP directory API, please join our Discord server