How do I use wandering-rag-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@wandering-rag-mcp search for machine learning basics" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

wandering-rag-mcp

by mambo-wang

Overview Schema Related Servers Score Discussions

Python

Hybrid

English | 中文

wandering-rag-mcp

A local RAG (Retrieval-Augmented Generation) knowledge base MCP server that exposes semantic document search as tools. Uses zvec (Alibaba's embedded vector database) for vector storage and Qwen3-Embedding-0.6B for text embedding.

No external LLM required — the MCP server handles retrieval, and the client (QoderWork, Claude Desktop, etc.) provides generation.

Features

Multi-format support: Plain text files (40+ types: md, txt, py, js, ts, go, rs, etc.) and binary documents (PDF, DOCX, PPTX, XLSX)
Embedded vector DB: zvec — zero-config, no Docker, WAL-persistent, HNSW-indexed
Local embedding: Qwen3-Embedding-0.6B (0.6B params, 1024-dim, 32K context, bilingual CN/EN)
Optional reranker: bge-reranker-v2-m3 cross-encoder for higher retrieval accuracy
REST API: HTTP endpoints for document management (upload/search/delete), runs alongside MCP on the same port
Three transport modes: stdio, SSE, Streamable HTTP
Multi-collection: Isolate documents into separate knowledge bases

Related MCP server: RAG MCP Server

Quick Start

Prerequisites

Python >= 3.10

Install

git clone <repo-url>
cd wandering-rag-mcp
pip install -e .

Run

# stdio mode (default, for QoderWork / Claude Desktop)
python server.py

# SSE mode
python server.py --mode sse --port 8000

# Streamable HTTP mode
python server.py --mode streamable-http --host 0.0.0.0 --port 8000

# Disable REST API (MCP only)
python server.py --mode sse --no-api

Environment variables are also supported:

Variable	Description	Default
`RAG_MCP_MODE`	Transport mode	`stdio`
`RAG_MCP_HOST`	Bind host	`127.0.0.1`
`RAG_MCP_PORT`	Bind port	`8000`
`RAG_EMBEDDING_MODEL`	Embedding model name	`Qwen/Qwen3-Embedding-0.6B`
`RAG_RERANKER_MODEL`	Reranker model name	`BAAI/bge-reranker-v2-m3`
`RAG_DATA_DIR`	Vector data directory	`./data`
`RAG_CORS_ORIGINS`	Allowed CORS origins (comma-separated)	`*`

Client Configuration

stdio Mode (QoderWork / Claude Desktop)

{
  "mcpServers": {
    "wandering-rag-mcp": {
      "command": "python",
      "args": ["D:\\repos\\rag-mcp\\server.py"]
    }
  }
}

SSE Mode

{
  "mcpServers": {
    "wandering-rag-mcp": {
      "type": "sse"
      "url": "http://your-server:8000/sse"
    }
  }
}

Streamable HTTP Mode

{
  "mcpServers": {
    "wandering-rag-mcp": {
      "type": "streamableHttp",
      "url": "http://your-server:8000/mcp"
    }
  }
}

MCP Tools

`search`

Search the knowledge base with natural language queries.

Parameter	Type	Default	Description
`query`	string	(required)	Natural language search query
`top_k`	int	5	Number of results to return
`collection`	string	`"default"`	Collection to search
`rerank`	bool	`true`	Use cross-encoder reranker for higher accuracy
`filter`	string	`""`	Glob pattern to filter by source file (e.g. `.md`, `/docs/`)
`expand_context`	int	0	Number of neighboring chunks to include before/after each result for broader context

`ingest_file`

Import a single file into the knowledge base.

Parameter	Type	Default	Description
`filepath`	string	(required)	Path to the file
`collection`	string	`"default"`	Target collection
`chunk_size`	int	800	Max characters per chunk
`force`	bool	`false`	Re-import even if file hasn't changed
`chunk_mode`	string	`"structural"`	Chunking strategy: `recursive` (character-based splitting), `semantic` (embedding similarity-based splitting), or `structural` (document structure-aware splitting by headings, code blocks, tables)

Change detection: By default, files that haven't changed since last import are skipped. Use force=true to re-import anyway.

Supported formats: .md, .txt, .py, .js, .ts, .pdf, .docx, .pptx, .xlsx, and 40+ more.

`ingest_directory`

Batch import all files in a directory.

Parameter	Type	Default	Description
`dirpath`	string	(required)	Directory path
`collection`	string	`"default"`	Target collection
`recursive`	bool	`true`	Scan subdirectories
`extensions`	string	`""`	Comma-separated extensions filter (empty = all supported)
`chunk_size`	int	800	Max characters per chunk
`force`	bool	`false`	Re-import even if files haven't changed
`chunk_mode`	string	`"structural"`	Chunking strategy: `recursive`, `semantic`, or `structural`

`ingest_url`

Download a file from a URL and import it into the knowledge base. Useful when the file is hosted on a web server or file sharing service.

Parameter	Type	Default	Description
`url`	string	(required)	HTTP or HTTPS URL of the file
`collection`	string	`"default"`	Target collection
`chunk_size`	int	800	Max characters per chunk
`force`	bool	`false`	Re-import even if file hasn't changed
`chunk_mode`	string	`"structural"`	Chunking strategy: `recursive`, `semantic`, or `structural`

`upload_info`

Returns the HTTP upload endpoint URL and usage instructions. Since MCP protocol does not support binary file transfer, this tool informs the client how to upload files via the REST API.

No parameters.

`list_collections`

List all knowledge base collections.

`list_documents`

List all documents in a collection.

Parameter	Type	Default	Description
`collection`	string	`"default"`	Collection name

`delete_document`

Remove a document and all its chunks from the knowledge base.

Parameter	Type	Default	Description
`filepath`	string	(required)	Path used during import
`collection`	string	`"default"`	Collection name

`delete_collection`

Delete an entire knowledge base collection and all its documents, vectors, and configuration. This cannot be undone.

Parameter	Type	Default	Description
`collection`	string	`"default"`	Collection name to delete

`configure_collection`

Set default parameters for a knowledge base collection. Future import and search operations will use these defaults when parameters are not explicitly specified.

Parameter	Type	Default	Description
`collection`	string	`"default"`	Collection name
`chunk_mode`	string	`""`	Default chunking strategy. Empty = keep current. `recursive`, `semantic`, or `structural`
`chunk_size`	int	`0`	Default max characters per chunk. 0 = keep current
`chunk_overlap`	int	`-1`	Default overlap characters. -1 = keep current
`rerank`	bool	`None`	Default whether to use reranker for search. None = keep current
`description`	string	`None`	Collection description. None = keep current

`get_collection_config`

View the current configuration for a collection.

Parameter	Type	Default	Description
`collection`	string	`"default"`	Collection name

REST API

When running in SSE or Streamable HTTP mode, a REST API is automatically available at /api/ alongside the MCP endpoint. This enables web frontends (e.g., CodingHub) to manage documents via HTTP while AI clients use MCP for search.

Disable with --no-api if you only need MCP.

`GET /api/health`

Health check endpoint.

`GET /api/collections`

List all knowledge base collections.

Response:

[{"name": "default", "doc_count": 5}]

`GET /api/collections/{name}/documents`

List all documents in a collection.

Response:

[{"source": "/path/to/file.md", "chunk_count": 12}]

`POST /api/collections/{name}/documents`

Upload a file to the knowledge base. Accepts multipart/form-data with a file field.

curl -F "file=@document.pdf" http://localhost:8000/api/collections/default/documents

Optional query parameters: chunk_size (default: 500), chunk_mode (recursive, semantic, or structural, default: recursive).

Response:

{"status": "ok", "filename": "document.pdf", "chunks": 24}

`DELETE /api/collections/{name}/documents`

Delete a document and all its chunks.

curl -X DELETE http://localhost:8000/api/collections/default/documents \
  -H "Content-Type: application/json" \
  -d '{"filepath": "/path/to/file.md"}'

Response:

{"status": "ok", "filepath": "/path/to/file.md", "deleted": 12}

`DELETE /api/collections/{name}`

Delete an entire collection and all its data.

curl -X DELETE http://localhost:8000/api/collections/my_collection

Response:

{"status": "ok", "collection": "my_collection", "deleted": true}

`POST /api/collections/{name}/search`

Semantic search across the knowledge base.

curl -X POST http://localhost:8000/api/collections/default/search \
  -H "Content-Type: application/json" \
  -d '{"query": "how to install", "top_k": 5, "rerank": false, "filter": "*.md", "expand_context": 1}'

Request body:

Field	Type	Default	Description
`query`	string	(required)	Search query
`top_k`	int	5	Number of results
`rerank`	bool	`false`	Use cross-encoder reranker
`filter`	string	`""`	Glob pattern to filter by source file path
`expand_context`	int	0	Number of neighboring chunks to include before/after each result

Response:

[
  {"id": "...", "score": 0.85, "text": "...", "source": "file.md", "chunk_index": 3}
]

`GET /api/collections/{name}/config`

Get the configuration for a collection.

Response:

{"chunk_mode": "semantic", "chunk_size": 500, "chunk_overlap": 50, "rerank": false, "description": "Technical docs"}

`PUT /api/collections/{name}/config`

Update collection configuration. Only include fields you want to change.

curl -X PUT http://localhost:8000/api/collections/default/config \
  -H "Content-Type: application/json" \
  -d '{"chunk_mode": "semantic", "description": "Technical documentation"}'

Response: Returns the full updated configuration.

CORS

The REST API includes CORS headers by default (allows all origins). Restrict with the RAG_CORS_ORIGINS environment variable:

RAG_CORS_ORIGINS=http://localhost:5173,http://localhost:8080 python server.py --mode sse

Architecture

flowchart TB
    subgraph Client["MCP Client (QoderWork, etc.)"]
        direction LR
        C1["User question"] --> C2["Call MCP tools"] --> C3["LLM answer"]
    end

    Client <-->|"stdio / SSE / Streamable HTTP"| Server

    subgraph Server["RAG MCP Server (FastMCP)"]
        direction LR
        subgraph Tools[" "]
            direction TB
            T1["Ingest Pipeline"] ~~~ T2["Search Pipeline"] ~~~ T3["Collection Manager"]
        end
        Tools --> Embed & Rerank & Vec
        Embed["sentence-transformers<br/>Qwen3-Embedding-0.6B"]
        Rerank["Cross-Encoder<br/>bge-reranker-v2-m3"]
        Vec["zvec<br/>./data/"]
    end

    style Client fill:#e8f4f8,stroke:#2196F3
    style Server fill:#f5f5f5,stroke:#333
    style Tools fill:#fff3e0,stroke:#FF9800
    style Embed fill:#fce4ec,stroke:#E91E63
    style Rerank fill:#e8eaf6,stroke:#3F51B5
    style Vec fill:#f3e5f5,stroke:#9C27B0

Project Structure

wandering-rag-mcp/
├── pyproject.toml          # Dependencies and entry point
├── server.py               # MCP server entry + 6 tool definitions + combined ASGI
├── api/
│   ├── __init__.py
│   └── app.py              # REST API routes (starlette)
├── core/
│   ├── chunker.py          # Text chunking (recursive + semantic)
│   ├── embeddings.py       # sentence-transformers wrapper (lazy load)
│   ├── reranker.py         # Cross-encoder reranker (lazy load)
│   ├── service.py          # Shared business logic (MCP + REST)
│   └── vector_store.py     # zvec wrapper (CRUD + search)
├── data/                   # zvec storage (auto-created at runtime)
│   └── default/
└── .gitignore

How It Works

Ingest: File is read (plain text or converted via markitdown) → split into overlapping chunks → each chunk embedded into a 1024-dim vector → stored in zvec with metadata (text, source path, chunk index)
Search: Query text → embedded into vector → zvec ANN search returns top-k nearest chunks with similarity scores → optionally reranked by cross-encoder for higher accuracy → returned as formatted text with source references
Document ID: SHA256 hash of the file path (first 16 chars) is used as a stable document ID, enabling idempotent re-imports and deletion by file path.

Dependencies

Package	Purpose
`mcp`	MCP protocol SDK (FastMCP)
`zvec`	Embedded vector database by Alibaba
`sentence-transformers`	Load and run embedding models
`markitdown[all]`	Convert PDF/DOCX/PPTX/XLSX to Markdown
`python-multipart`	Multipart form parsing for REST API file uploads

Technical Documentation

For detailed architecture and technical stack explanation, see Architecture Document.

Deployment

Quick Install (Online)

For a clean Linux server with internet access:

curl -sSL https://raw.githubusercontent.com/mambo-wang/wandering-rag-mcp/main/deploy/setup.sh | bash

This installs everything: Python venv, dependencies, embedding model, and generates start scripts.

Offline Install

For air-gapped servers, use the offline packaging scripts in deploy/:

# On a machine with internet: prepare the bundle (~3GB with models)
cd deploy && bash prepare.sh x86_64

# Transfer wandering-rag-mcp-offline.tar.gz to the target server, then:
tar xzf wandering-rag-mcp-offline.tar.gz
cd bundle && bash install.sh

See deploy/README.md for full deployment guide.

Roadmap

The following improvements are planned for future releases:

Hybrid search: Combine BM25 keyword retrieval with semantic search using Reciprocal Rank Fusion (RRF) for better precision on exact-match queries (function names, error codes, technical terms)
SQLite metadata layer: Replace _registry.json with SQLite for document metadata, enabling server-side metadata filtering (WHERE clauses) and reliable batch deletion instead of the current ID-probing approach
Evaluation framework: Built-in recall@k and MRR benchmarks with a CLI evaluation script, enabling quantitative measurement of retrieval quality when tuning chunking strategies or swapping models
Token-based chunk sizing: Replace character-based chunk_size with token-based sizing for consistent chunk lengths across different languages (CJK vs. Latin scripts)
Embedding batch control: Configurable batch size for encode() to prevent memory spikes when ingesting large documents with hundreds of chunks
Concurrent access safety: File-level locking for _registry.json and thread-safe VectorStore operations to prevent corruption under concurrent REST API requests

License

MIT

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mambo-wang/wandering-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server