vectorise-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@vectorise-mcpindex the folder ~/documents as my-project"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
vectorise-mcp
Local MCP server that turns folders of documents into a hybrid vector + keyword index that Claude Desktop can search. Stays offline after first model download.
Stack
MCP:
mcp(FastMCP), stdio transportEmbeddings:
BAAI/bge-small-en-v1.5(384-dim)Reranker:
BAAI/bge-reranker-basecross-encoderVector DB:
sqlite-vecKeyword DB: SQLite FTS5 (BM25)
Fusion: Reciprocal Rank Fusion → cross-encoder rerank
Related MCP server: smart-search
Install
pip install vectorise-mcp # core
pip install "vectorise-mcp[ocr]" # + OCR for scanned PDFs / images
pip install "vectorise-mcp[notify]" # + desktop toast on job completion
pip install "vectorise-mcp[ocr,notify]" # everything
vectorise-mcp setup # pre-download models (~250MB)Python ≥ 3.10.
Wire into Claude Desktop
claude_desktop_config.json:
{
"mcpServers": {
"vectorise": {
"command": "vectorise-mcp",
"args": ["serve"]
}
}
}Config file location:
Windows:
%APPDATA%\Claude\claude_desktop_config.jsonmacOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonLinux:
~/.config/Claude/claude_desktop_config.json
Restart Claude Desktop.
File support
Format | Notes |
| text + OCR fallback for scanned pages |
| full content + tables |
| UTF-8 |
| OCR (requires |
| detected, skipped, reported |
Tools exposed to Claude
Tool | What it does |
| list all indexed projects |
| start indexing job, returns |
| SHA1-incremental rescan of all sources |
| instant job snapshot incl. progress + ETA |
| optional blocking wait |
| jobs from current server session |
| hybrid + reranked search |
| delete project's |
mode for vectorise_index_project: auto (default — incremental if path already indexed, error on conflict) / replace / append / fail.
Architecture
Indexing job runs in a daemon thread with its own asyncio loop. The MCP server's main loop stays free to serve index_status / search calls regardless of how heavy the embedding/OCR work is. Status calls are instant; search works on the partial index while a job is running.
folder
↓ parsers.parse (.pdf .docx .pptx .xlsx ...)
chunks (sentence-aware, 384 tok / 96 overlap, single-sentence hard-split)
↓ embedder.embed_passages (BGE-small)
sqlite-vec + FTS5 (BM25) ← per-file SHA1 dedup, basename collision auto-rename
↓ search (vector top-N + BM25 top-N)
RRF fusion → cross-encoder rerank → top-KProject DBs live in ~/.vectorise-mcp/<name>.db. Self-contained — source folder can be deleted after indexing.
Config (env vars)
Var | Default | Purpose |
|
| must be 384-dim |
|
| |
|
| |
|
| |
|
| drop OCR lines below |
|
| parallel page OCR threads |
|
| PDF rasterisation DPI |
|
| downscale huge images before OCR |
|
| desktop toast on/off |
Performance
CPU | GPU | |
Indexing throughput | ~80 chunks/sec | 5–10× faster |
Search latency (k=5, ≤500K chunks) | ~150ms | similar |
Disk per chunk | ~2 KB | |
Cold start | ~5s (lazy model load) |
Local dev
git clone https://github.com/jameslovespancakes/Vectorised-Embedding-MCP
cd Vectorised-Embedding-MCP
pip install -e ".[ocr,notify]"
# tests bypass MCP transport, drive indexer + tools directly
python tests/smoke_test.py
python tests/smoke_test_projects.py
python tests/smoke_test_jobs.py
python tests/smoke_test_filters.py
python tests/smoke_test_office.py
python tests/smoke_test_chunking.py
python tests/smoke_test_legacy_skip.pyLicense
MIT.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/jameslovespancakes/Vectorised-Embedding-MCP'
If you have feedback or need assistance with the MCP directory API, please join our Discord server