Skip to main content
Glama
linkwut-create

local-rag-core

local-rag-core

Headless local knowledge library and RAG substrate. It is designed to act as a secure, local, structure-preserving data layer for coding assistants (like Claude Code, Codex, and OpenCode), browsers, translators, and agentic workflows.


1. Core Principles

  1. Headless & UI-less: This is not a chat application. It manages file ingestion, document chunking, indexing, hybrid retrieval, reranking, and source attribution.

  2. Strict logical separation:

    • private_kb: User's long-term private notes and documentation.

    • packs: Modular, shareable documentation packages that can be registered, enabled, or disabled.

    • private_project: Local source code repositories with filters protecting sensitive data.

  3. Always Attributed: All retrieved chunks specify their source_type, pack_id/project_id, chunk ID, file path, section context, and normalized relevance scores.

  4. Read-Only MCP Server: The MCP tool interface is strictly read-only (kb.search, kb.get_chunk, kb.list_packs, kb.health). Data modifications must occur via the CLI or HTTP API.


Related MCP server: ragi

2. Technology Stack

  • Python: >=3.10

  • SQLite / FTS5: Relational registries, metadata index, and BM25 full-text search.

  • Zvec: Embedded in-process vector database for semantic indexing.

  • MCP: Low-level mcp.server.Server stdio server; read-only by design.

  • FastAPI & Uvicorn: High-performance HTTP microservice.

  • Typer: Type-safe CLI builder.


3. Installation

Minimal Install

Core engine with CLI, SQLite FTS5 search, and mock fallbacks (no ML deps):

pip install -e .

Development Install

Core + testing, linting, API/HTTP test deps:

pip install -e ".[dev]"

This is sufficient for running the full test suite on a clean machine. Tests that require optional packages (sentence-transformers, zvec, mcp) will be skipped automatically.

For day-to-day development, use the fast lane:

pytest -m "not real_model"

Use the full real-backend lane before a maintenance handoff:

pytest
rag health --strict
rag integrity --deep
python scripts/run_eval.py
python scripts/ops_verify.py

scripts/run_eval.py prewarms the retriever by default and reports the warmup latency separately from per-query latency. Use --no-warmup when you explicitly want to measure process/model cold start.

Full Install

All backends and interfaces — real embeddings, vector store, reranker, MCP, HTTP API:

pip install -e ".[all]"

Selective Install

Install only the extras you need:

# MCP Server (stdio-based, for Claude Code / Codex / Gemini)
pip install -e ".[mcp]"

# HTTP FastAPI Server
pip install -e ".[api]"

# Real neural embeddings (BAAI/bge-m3 via sentence-transformers)
pip install -e ".[embedding]"

# Real cross-encoder reranking (BAAI/bge-reranker-large via CrossEncoder)
pip install -e ".[rerank]"

# Native Zvec vector index
pip install -e ".[vector]"

# ModelScope (MotSE) model download source
pip install -e ".[modelscope]"

4. Interfaces & Usage

4.1 CLI Interface (rag and rag-project)

System Health Check

rag health

Pack Registry Management

# List all registered packs and code projects
rag pack list

# Enable or disable specific packs
rag pack enable fastapi_docs
rag pack disable python_docs

# Build, export, and import packs
rag pack build my_pack /path/to/docs --name "My Documentation Pack" --domain "tech"
rag pack export my_pack --output /path/to/my_pack.tar.gz
rag pack import /path/to/my_pack.tar.gz

General Document Ingestion

rag ingest <file-or-dir> --pack <id> --source-type <private|pack>

Code Project Ingestion (Standalone or Subcommand)

Code ingestion supports common extensions (.py, .js, .ts, .go, .rs, etc.) and respects exclusions (.git, node_modules, .env).

# Using subcommand
rag project ingest /path/to/my-code --project my-app-id

# Using standalone tool
rag-project /path/to/my-code --project my-app-id
rag search "how to handle background tasks" --limit 5 --mode hybrid --rerank bge

Notes on scope:

  • If no --packs scope is given, the engine searches all enabled packs.

  • --packs pack1,pack2 restricts results to those packs plus any user private notes (source_type='private'); other private_project packs are not automatically included.

  • Add --no-private to exclude user private notes while still honoring the listed packs (useful for targeting a single project).

Retrieve Chunk Details

rag chunk get <chunk_id>

4.2 HTTP API Interface

Configure a write token and the filesystem roots that HTTP write operations may access. On Windows, separate multiple roots with ;; on Unix, use :.

$env:LOCAL_RAG_API_TOKEN = "replace-with-a-long-random-token"
$env:LOCAL_RAG_ALLOWED_ROOTS = "C:\Users\Zero\Documents;D:\Knowledge"

Run the FastAPI microservice:

uvicorn local_rag_core.interfaces.api:app --host 127.0.0.1 --port 8000

All mutating endpoints require the token in the X-Local-RAG-Token request header. Path-based write endpoints reject paths outside LOCAL_RAG_ALLOWED_ROOTS. Read-only health, pack listing, search, and chunk retrieval endpoints do not require the token.

Endpoints Summary

  • GET /health: Returns database health check and chunk count metadata.

  • GET /packs: Returns registered documentation packages and projects list.

  • POST /packs/{pack_id}/enable: Enables a registered pack.

  • POST /packs/{pack_id}/disable: Disables a registered pack.

  • GET /chunks/{chunk_id}: Retrieves full text content and metadata of a specific chunk.

  • POST /project/ingest: Ingests a local project repository.

    • Body: {"path": "/absolute/path", "project": "project_id"}

  • POST /packs/build: Compiles a documentation source directory into a self-contained pack.

    • Body: {"pack_id": "my_pack_id", "source_dir": "/path/to/docs", "name": "My Pack", "domain": "tech", "description": "desc"}

  • POST /packs/export: Bundles an installed pack directory into a tarball archive.

    • Body: {"pack_id": "my_pack_id", "output_path": "/path/to/my_pack_id.tar.gz"}

  • POST /packs/import: Decompresses and registers a pack archive into the local library.

    • Body: {"archive_path": "/path/to/my_pack_id.tar.gz"}

  • POST /search: Queries the RAG substrate using keyword, semantic, or hybrid configurations.

    • Body:

      {
        "query": "FastAPI background workers",
        "limit": 8,
        "mode": "hybrid",
        "scope_private": true,
        "scope_packs": ["fastapi_docs"],
        "rerank": "bge"
      }

4.3 MCP Tool Interface

The MCP server connects local-rag-core directly to LLM clients (like Claude Desktop or Claude Code) using stdio.

Run the MCP server directly:

C:\Users\Zero\AppData\Local\hermes\hermes-agent\venv\Scripts\python.exe -m local_rag_core.interfaces.mcp_server

Use a Python runtime that can import both local_rag_core and the mcp SDK. On a migrated machine, update the AI-tool MCP configs to that machine's verified Python path.

Registered MCP Tools

  1. kb.health: Checks database health and indexes (No arguments).

  2. kb.list_packs: Lists registered packages/projects (No arguments).

  3. kb.get_chunk: Fetches the full contents of a single chunk (Args: chunk_id).

  4. kb.search: Performs semantic, keyword, or hybrid query search.

    • Args: query (str), limit (int), mode (str), no_private (bool), scope_packs (List[str]), rerank (str).

Claude Desktop Configuration

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "local-rag-core": {
      "command": "python",
      "args": [
        "-m",
        "local_rag_core.interfaces.mcp_server"
      ]
    }
  }
}

5. Security & Exclusions

In order to prevent indexing credentials or build outputs, the scanner ignores files matching patterns defined in should_ignore():

  • Directories: .git, .venv, venv, node_modules, build, dist, __pycache__

  • Files: .env, .pem, .key, registry.sqlite, .log

Pack IDs are restricted to ASCII letters, digits, dots, underscores, and hyphens. Pack archives reject absolute paths, traversal entries, links, special files, excessive member counts, and excessive extracted sizes.


6. Embedding Backend Modes

local-rag-core supports two embedding backends, controlled by the LOCAL_RAG_EMBEDDING_BACKEND environment variable.

mock (default when sentence-transformers is not installed)

A deterministic word-overlap hash-based embedding generator. Each word is hashed to a pseudo-random unit vector; the document vector is the L2-normalized sum of its word vectors.

  • Purpose: structural testing, CI, and lightweight development.

  • Does NOT provide true semantic retrieval.

  • Forced by: LOCAL_RAG_EMBEDDING_BACKEND=mock

bge-m3 (requires sentence-transformers)

Uses the BAAI/bge-m3 model loaded via sentence-transformers. This is the real semantic embedding backend.

  • Install: pip install -e ".[embedding]" (add ,modelscope to prioritize ModelScope downloads)

  • Model: BAAI/bge-m3 (overridable via EMBEDDING_MODEL)

  • Cache: HuggingFace default cache (~/.cache/huggingface/hub/) or LOCAL_RAG_MODEL_CACHE if set. ModelScope uses a modelscope/ subdir under the same cache root when selected.

  • Offline by default: set ALLOW_MODEL_DOWNLOAD=true to allow downloading the model on first use.

  • Forced by: LOCAL_RAG_EMBEDDING_BACKEND=bge-m3

Model Download Sources

By default (LOCAL_RAG_MODEL_SOURCE=auto), local-rag-core tries to download models from ModelScope (魔搭) first when the modelscope package is installed, and falls back to HuggingFace Hub otherwise. Model IDs are identical on both platforms (BAAI/bge-m3, BAAI/bge-reranker-large).

Source

Behavior

auto

ModelScope first if installed, else HuggingFace

modelscope

Force ModelScope; error if modelscope is not installed

huggingface

Force HuggingFace Hub

Set USE_MODELSCOPE=true as a shorthand for LOCAL_RAG_MODEL_SOURCE=modelscope. If both are set, LOCAL_RAG_MODEL_SOURCE wins.

auto (default)

When LOCAL_RAG_EMBEDDING_BACKEND is unset or set to auto, the system selects bge-m3 if sentence-transformers is importable, otherwise falls back to mock.

Configuration Reference

Env Var

Values

Default

Effect

LOCAL_RAG_EMBEDDING_BACKEND

auto, mock, bge-m3

auto

Selects embedding backend

LOCAL_RAG_MODEL_SOURCE

auto, modelscope, huggingface

auto

Primary model download source

USE_MODELSCOPE

true, false, 1, 0

false

Shorthand for LOCAL_RAG_MODEL_SOURCE=modelscope

LOCAL_RAG_MODEL_CACHE

any path

(HF default)

Custom model cache directory

EMBEDDING_MODEL

HF/MS model name

BAAI/bge-m3

Which model to load

ALLOW_MODEL_DOWNLOAD

true, false, 1, 0

false

Allow first-time model download

DEVICE

cpu, cuda

cpu

Torch device

Health Output

$ rag health
Embedding backend: mock       # sentence-transformers not installed, or forced
Embedding backend: bge-m3     # sentence-transformers available and model loaded
Model source: auto            # auto / modelscope / huggingface
ModelScope available: false   # true when modelscope package is installed
Registered packs: 22
Indexed chunks: 69341
Disabled packs: 2
Pack status counts: {"disabled": 2, "enabled": 20}
Pack source type counts: {"pack": 12, "private_project": 6, "source_code_index": 4}
Chunk source type counts: {"pack": 58437, "private_project": 10841, "source_code_index": 63}
Latest audit action: ingest_path

7. Mock / Fallback Modes

To support light-weight local development and testing, local-rag-core provides built-in fallback modes if machine learning dependencies or databases are missing:

  1. Mock Embedding: See §6 above for full details.

  2. Simple Flat Vector Store: If the zvec binary extension is not installed, the engine uses SimpleFlatVectorStore, a pure-Python in-process cosine similarity engine storing vectors in JSON files.

  3. Mock Reranking: If sentence-transformers / CrossEncoder cannot be used, the reranker falls back to a simple query-document word-overlap heuristic.

  4. Verification: Always run rag health --strict to inspect which backends are active (mock vs. real). Production readiness requires installing the corresponding extras (embedding, rerank, vector) and locally cached models.

Embedding model loading is offline by default. Even when sentence-transformers is installed, only locally cached model files are used. Set ALLOW_MODEL_DOWNLOAD=true only when a model download has been explicitly approved.


8. Knowledge Governance

Long-term use of local-rag-core depends on clear boundaries between different kinds of knowledge. See KNOWLEDGE_GOVERNANCE.md for the full policy. The summary below describes the source types and basic rules.

Source Types

source_type

Purpose

Example Content

private_project

Documentation of an active project you maintain.

README.md, CLAUDE.md, PROJECT_STATUS.md, CHANGELOG.md, docs/**/*.md

pack

Reusable, shareable documentation package.

Tutorials, framework guides, methodology docs

translator_pack

Translation terminology, profiles, and history.

Glossaries, profiles, translation memory

browser_saved

Web pages explicitly saved by the user.

Curated web articles, reference pages

browser_context

Temporary context from the current web page.

Current page summary, used once and discarded

source_code_index

Indexed source code (disabled by default).

src/**/*.py only when explicitly enabled

scratch

Temporary experimental material.

Drafts, quick tests, one-off notes

Pack ID Conventions

  • Active project docs: <project_name>_project

    • Example: local_rag_core_project, local_llm_pipeline_project

  • Reusable doc package: <topic>_pack

    • Example: fastapi_docs_pack

  • Translation assets: translator_pack

  • Saved web pages: browser_saved

Ingestion Rules

  1. Project docs go into private_project.

  2. Reusable tutorials / frameworks go into pack.

  3. Translation assets go into translator_pack.

  4. Web pages are temporary by default; manual save is required for browser_saved.

  5. Source code indexing is explicit-scope only. Curated source indexes use source_code_index and should be queried intentionally rather than mixed into broad documentation retrieval by default.

  6. MCP tools are read-only. Ingestion, export, import, delete, and reindex must use the CLI or authenticated HTTP API.

Exclusions

The scanner ignores:

  • .git/, .venv/, venv/, __pycache__/

  • node_modules/, dist/, build/

  • storage/, data/packs/

  • .env, .pem, .key, .log

  • Large binaries and generated lock files unless explicitly required


9. Verification & Testing Commands

To run basic checks, test suites, and inspect RAG health:

# 1. Install development tools and run test suite
pip install -e ".[dev]"
pytest
ruff check .

# 2. Run system health check and verify status
rag health

# 3. Test pack building and list packages
rag pack list
rag pack build my_docs_pack ./docs --name "Documentation Pack"
rag pack list

# 4. Ingest and query
rag ingest README.md --pack readme_pack --source-type pack
rag search "automatic query routing" --limit 3 --mode keyword
rag search "local knowledge library for code assistants" --limit 3 --mode hybrid

# 5. Verify MCP Tool interface (if mcp extra installed)
C:\Users\Zero\AppData\Local\hermes\hermes-agent\venv\Scripts\python.exe -m local_rag_core.interfaces.mcp_server

# 6. Verify AI-tool entrypoint readiness
#    Checks Codex, Claude Code, Gemini CLI, OpenCode config, and stdio launch.
PYTHONIOENCODING=utf-8 python scripts/ops_verify.py
A
license - permissive license
-
quality - not tested
A
maintenance

Maintenance

Maintainers
Response time
Release cycle
1Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/linkwut-create/local-rag-core'

If you have feedback or need assistance with the MCP directory API, please join our Discord server