recall-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@recall-mcpsearch my notes for how to undo a git commit"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
π§ Recall β a local, private knowledge-base MCP server
Recall turns a folder of your own notes and documents into a searchable knowledge base that any AI assistant can use. It is a Model Context Protocol (MCP) server: connect it to Claude Desktop, Claude Code, or any MCP client, and the assistant can search, read, and add to your notes through well-defined tools.
It uses semantic search powered by local embeddings, so it finds passages by meaning, not just matching keywords β and it runs entirely on your machine. No API key, no cloud, your documents never leave your device.
Why this project is interesting
Retrieval-Augmented Generation (RAG) done locally β chunking, embeddings, and cosine-similarity retrieval, the core of modern AI knowledge systems.
Model Context Protocol β exposes capabilities as tools an LLM can call, the emerging standard for connecting AI assistants to real systems.
Privacy-first β semantic search runs on-device with a small embedding model; nothing is sent to a third party.
Graceful degradation β if the embedding model can't load, it automatically falls back to keyword search instead of breaking.
Related MCP server: mcp-apple-notes
See it in action
Ask Claude (with Recall connected) "search my notes for how to undo a git
commit" β it calls the search_documents tool and answers grounded in
git-cheatsheet.md, entirely on your machine.
See the difference: keyword vs. semantic
Ask "how do I undo a commit?" against a small dev knowledge base:
Search mode | Top result | Why |
Keyword | the doc that literally contains the words "undo a commit" | matches exact words |
Semantic |
| matches meaning |
Semantic search finds the genuinely useful answer even though the words don't overlap. That is the whole point of embeddings.
Retrieval quality (measured)
A small labelled eval (10 paraphrased queries over the 4 sample docs) shows semantic search recovering the right document more often as the window widens:
Mode | recall@1 | recall@3 |
Keyword | 80% | 80% |
Semantic | 80% | 100% |
Reproduce it with python eval/run_eval.py. The corpus is small, so treat the
numbers as illustrative β but the harness is the point: retrieval quality here is
measured, not assumed.
What the AI can do (the MCP tools)
Tool | What it does |
| Find the most relevant passages. |
| Return the full text of one document so the assistant can read or summarise it. |
| List the documents currently loaded and the active search mode. |
| Save a new note into the knowledge base; it becomes searchable immediately. |
How it works
Your documents (.md / .txt)
β
βΌ
βββββββββββββββββββββ
β DocumentStore β 1. split each file into paragraph "chunks"
β (recall/store.py) β 2. embed every chunk into a vector (local model)
βββββββββββββββββββββ
β query
βΌ
βββββββββββββββββββββ
β Semantic search β embed the query, rank chunks by cosine similarity
β (or keyword) β (falls back to keyword search if no model)
βββββββββββββββββββββ
β tools
βΌ
βββββββββββββββββββββ MCP (stdio / JSON-RPC)
β FastMCP server β βββββββββββββββββββββββββββββΆ Claude Desktop,
β (recall/server.py)β Claude Code, ...
βββββββββββββββββββββChunk β documents are split on blank lines into passages, with each Markdown heading kept attached to the text it introduces, so results land on a precise, self-contained passage rather than a whole file.
Embed β each chunk is turned into a vector with a local
fastembedmodel (bge-small-en-v1.5, 384-dimensional vectors).Retrieve β a query is embedded and compared to every chunk by cosine similarity; the closest chunks win.
Serve β the FastMCP server exposes search/read/write as MCP tools over stdio, so any MCP client can use them.
Quickstart
Requires Python 3.10+.
# 1. Clone and enter the project
git clone https://github.com/jaswanthsurya007-source/recall-mcp.git
cd recall-mcp
# 2. Create and activate a virtual environment
python -m venv .venv
# Windows (PowerShell):
.venv\Scripts\Activate.ps1
# macOS / Linux:
source .venv/bin/activate
# 3. Install
pip install -e .
# 4. Try a search from Python
python -c "from recall.store import DocumentStore; s=DocumentStore('data/documents'); print([r.chunk.source for r in s.search('how do I undo a commit', 1)])"The first run downloads the embedding model (~66 MB) once, then caches it.
Behind a corporate proxy?
Recall uses truststore to trust
your operating system's certificates automatically, so it works on networks that
inspect TLS traffic (common at large companies) without extra configuration.
Connect it to Claude Desktop
Add Recall to your claude_desktop_config.json
(Settings β Developer β Edit Config):
{
"mcpServers": {
"recall": {
"command": "/absolute/path/to/recall-mcp/.venv/bin/python",
"args": ["-m", "recall.server"],
"env": {
"RECALL_DOCS_DIR": "/absolute/path/to/recall-mcp/data/documents"
}
}
}
}On Windows, use the full path to python.exe and escape backslashes, e.g.
"C:\\path\\to\\recall-mcp\\.venv\\Scripts\\python.exe".
Restart Claude Desktop, and you'll see Recall's tools available. Ask it things like "Search my notes for how to undo a git commit" or "Save a note titled 'Meeting' with these action itemsβ¦".
Use your own documents
Point Recall at any folder of .md / .txt files:
# macOS / Linux: set RECALL_DOCS_DIR to your own notes folder
RECALL_DOCS_DIR="/path/to/my/notes" python -m recall.server# Windows (PowerShell)
$env:RECALL_DOCS_DIR = "C:\path\to\my\notes"; python -m recall.serverThe data/documents/ folder ships with a few sample notes so you can try it
immediately.
Running the tests
pip install -e ".[dev]"
pytest -qThe test suite runs fully offline (keyword mode), so it needs no model download.
Project structure
recall-mcp/
βββ recall/
β βββ server.py # FastMCP server: defines the MCP tools
β βββ store.py # load β chunk β search documents (semantic + keyword)
β βββ embeddings.py # local embedding model wrapper (fastembed)
βββ data/documents/ # sample knowledge base (.md notes)
βββ tests/ # offline pytest suite
βββ eval/ # retrieval-quality eval (recall@k)
βββ pyproject.toml # packaging + tooling config
βββ requirements.txt
βββ LICENSEDesign notes
Why local embeddings? Privacy and zero cost.
fastembeduses ONNX runtime rather than PyTorch, so installs are small and inference is fast on CPU.Why chunk by paragraph? It is simple and transparent, and it makes results land on a focused passage. A future version could use overlapping token windows.
Why a fallback to keyword search? A tool should never hard-fail. If the model can't be downloaded, search still works β just less cleverly.
Re-indexing on write is a full reload for clarity; at larger scale you would embed only the newly added chunks.
Roadmap
Retrieval-quality eval harness (recall@k)
Persist embeddings to disk so startup is instant on large corpora
Support PDF and HTML documents
Hybrid search (combine semantic + keyword scores)
Optional LLM-generated summaries via the Claude API
Expose documents as MCP resources, not just tools
License
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/jaswanthsurya007-source/recall-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server