How do I use vectorise-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@vectorise-mcp index the folder ~/documents as my-project" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

vectorise-mcp

by jameslovespancakes

Overview Schema Related Servers Score Discussions

Python

Local

vectorise-mcp

Local MCP server that turns folders of documents into a hybrid vector + keyword index that Claude Desktop can search. Stays offline after first model download.

PyPI

Stack

MCP: mcp (FastMCP), stdio transport
Embeddings: BAAI/bge-small-en-v1.5 (384-dim)
Reranker: BAAI/bge-reranker-base cross-encoder
Vector DB: sqlite-vec
Keyword DB: SQLite FTS5 (BM25)
Fusion: Reciprocal Rank Fusion → cross-encoder rerank

Related MCP server: ragi

Install

pip install vectorise-mcp                 # core
pip install "vectorise-mcp[ocr]"          # + OCR for scanned PDFs / images
pip install "vectorise-mcp[notify]"       # + desktop toast on job completion
pip install "vectorise-mcp[ocr,notify]"   # everything

vectorise-mcp setup                       # pre-download models (~250MB)

Python ≥ 3.10.

Wire into Claude Desktop

claude_desktop_config.json:

{
  "mcpServers": {
    "vectorise": {
      "command": "vectorise-mcp",
      "args": ["serve"]
    }
  }
}

Config file location:

Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Restart Claude Desktop.

File support

Format	Notes
`.pdf`	text + OCR fallback for scanned pages
`.docx`, `.pptx`, `.xlsx`, `.xlsm`, `.xls`	full content + tables
`.txt`, `.md`, `.markdown`	UTF-8
`.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.webp`	OCR (requires `[ocr]`)
`.doc`, `.ppt`	detected, skipped, reported

Tools exposed to Claude

Tool	What it does
`vectorise_list_projects`	list all indexed projects
`vectorise_index_project(folder, project, mode)`	start indexing job, returns `job_id` instantly
`vectorise_reindex_project(project)`	SHA1-incremental rescan of all sources
`vectorise_index_status(job_id)`	instant job snapshot incl. progress + ETA
`vectorise_await_index(job_id, timeout_sec)`	optional blocking wait
`vectorise_list_jobs(active_only)`	jobs from current server session
`vectorise_search(project, query, k, candidate_pool, file_glob, subdirectory, page_min, page_max, min_similarity)`	hybrid + reranked search
`vectorise_delete_project(project)`	delete project's `.db`

mode for vectorise_index_project: auto (default — incremental if path already indexed, error on conflict) / replace / append / fail.

Architecture

Indexing job runs in a daemon thread with its own asyncio loop. The MCP server's main loop stays free to serve index_status / search calls regardless of how heavy the embedding/OCR work is. Status calls are instant; search works on the partial index while a job is running.

folder
  ↓  parsers.parse                        (.pdf .docx .pptx .xlsx ...)
chunks (sentence-aware, 384 tok / 96 overlap, single-sentence hard-split)
  ↓  embedder.embed_passages              (BGE-small)
sqlite-vec   +   FTS5 (BM25)              ← per-file SHA1 dedup, basename collision auto-rename
  ↓  search                               (vector top-N + BM25 top-N)
RRF fusion → cross-encoder rerank → top-K

Project DBs live in ~/.vectorise-mcp/<name>.db. Self-contained — source folder can be deleted after indexing.

Config (env vars)

Var	Default	Purpose
`VECTORISE_MCP_EMBED_MODEL`	`BAAI/bge-small-en-v1.5`	must be 384-dim
`VECTORISE_MCP_RERANKER_MODEL`	`BAAI/bge-reranker-base`
`VECTORISE_MCP_EMBED_BATCH`	`32`
`VECTORISE_MCP_RERANKER_BATCH`	`16`
`VECTORISE_MCP_OCR_MIN_CONFIDENCE`	`0.5`	drop OCR lines below
`VECTORISE_MCP_OCR_WORKERS`	`4`	parallel page OCR threads
`VECTORISE_MCP_OCR_DPI`	`200`	PDF rasterisation DPI
`VECTORISE_MCP_OCR_MAX_DIM`	`4000`	downscale huge images before OCR
`VECTORISE_MCP_NOTIFY`	`1`	desktop toast on/off

Performance

	CPU	GPU
Indexing throughput	~80 chunks/sec	5–10× faster
Search latency (k=5, ≤500K chunks)	~150ms	similar
Disk per chunk	~2 KB
Cold start	~5s (lazy model load)

Local dev

git clone https://github.com/jameslovespancakes/Vectorised-Embedding-MCP
cd Vectorised-Embedding-MCP
pip install -e ".[ocr,notify]"

# tests bypass MCP transport, drive indexer + tools directly
python tests/smoke_test.py
python tests/smoke_test_projects.py
python tests/smoke_test_jobs.py
python tests/smoke_test_filters.py
python tests/smoke_test_office.py
python tests/smoke_test_chunking.py
python tests/smoke_test_legacy_skip.py

License

MIT.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jameslovespancakes/Vectorised-Embedding-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server