recall-memory-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@recall-memory-mcprecall what I learned about fixing database connection issues"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
recall-memory-mcp
A relevance-gated, self-improving procedural memory for AI agents, as an MCP server.
Most agent-memory tools remember facts (conversations, preferences). This one stores the lessons an agent learns, surfaces only the ones relevant to the task at hand, and gets better over time by learning from failures, deduping, and pruning what it never uses. So the agent stops dumping its whole history into context, and it stops repeating its own mistakes.
Why
A long-running agent accretes memory and usually loads all of it every session. That is expensive, slow, and it drowns the current truth in stale history, so the agent drifts back to old, superseded decisions.
Measured on a real 64-day production agent: ~91,000 tokens were loaded every session, and ~90% of it was never used. Relevance-gating cut that to a few hundred tokens per task (about a 99% reduction), and the drift stopped, because stale history only surfaces when a task is actually about it.
Related MCP server: BuildAutomata Memory MCP Server
The full lifecycle (eight tools)
recall(task, k)-- only the lessons and state relevant to what you are about to do, each with an actionablecheck. Self-tracks which lessons get used.preflight(task, k)-- a pre-action checklist: the specific things to verify before editing a file, sending a message, deploying, or querying a database. Built for aPreToolUsehook so the right guardrails fire automatically, with no prose to re-read.learn(title, body, check)-- turn a failure or insight into a retrievable lesson. Closes the loop: next time the same situation comes up,recallsurfaces it. Dedupes -- a recurring failure bumps aseen_countinstead of cloning the lesson.memory_audit()-- how much loaded memory is never used (archive candidates) and how much is stale.prune()-- retire learned lessons safely: only those never retrieved and not recurring and older than a grace period, so fresh and recurring lessons are never lost.consolidate()-- flag near-duplicate lessons to merge.maintain()-- one self-maintenance pass: audit + safe prune + consolidate report. Safe to run on a schedule or at session wrap.reindex()-- rebuild after the memory files change.
How it works
Chunks the agent's markdown memory (rules whole; state, session log, and index at paragraph level), its
.claude/skills, and its runtime-learned lessons.Ranks with BM25 (length-normalised, so big stale blocks do not dominate), with source-weighting (real lessons beat index pointers), recency (current decisions beat superseded ones), and a generic-term down-weight (words like "task" or "file" stop inflating noise).
preflightadds a concept-overlap gate: a guardrail only fires if the task shares at least two distinctive (non-generic) terms with it, so a single coincidental word never trips a false checklist. This is corpus-size independent.Optional semantic/hybrid retrieval (
semantic.py, model2vec static embeddings, CPU, no torch) blends cosine similarity with BM25 so a differently-worded task still finds the right lesson. It degrades gracefully to pure BM25 if the dependency is absent.An on-disk index cache keyed on source-file mtimes keeps
preflightfast on the hot path (it runs before every risky tool call); a stale or truncated cache simply fails validation and rebuilds, so it can never serve wrong results.
Performance
On the production agent, with preflight wired into a PreToolUse hook (fires before every file edit / deploy / risky shell command):
Warm hook latency ~80 ms (down from ~330 ms) -- index cache + skipping the embedding import in fast mode.
In-process
preflightlookup ~0.5 ms.Per-task context ~99% smaller than loading all memory.
Tested
A committed test suite covers retrieval precision (the right guardrail fires; benign and irrelevant actions stay silent), latency, hook robustness against malformed and hostile input, and the full auto-learn loop end-to-end (a failure is detected, distilled into a lesson, and becomes retrievable). A second, self-contained smoke test builds a tiny fake agent repo in a tempdir and exercises the whole lifecycle with no external data:
python3 tests/test_smoke.pyInstall and use
pip install mcp
RECALL_MEMORY_ROOT=/path/to/your/agent/repo python mcp_server.py # as a stdio MCP serverBy default it indexes .claude/rules/anti-paperclip.md, memory/state.md, memory/session-log.md, and memory/INDEX.md under RECALL_MEMORY_ROOT, plus the .claude/skills it finds and a learned.json it maintains.
To use your own layout, drop a recall.sources.json in RECALL_MEMORY_ROOT (any key you omit falls back to the default):
{
"rules": [{"path": "docs/rules.md", "split": "\\n###\\s+Rule\\s+"}],
"paragraphs": [
{"path": "docs/state.md", "label": "state", "split": "\\n##\\s+"}
],
"skills_dir": ".claude/skills",
"weights": {"rule": 1.2, "state": 1.0}
}rules files are chunked whole per section (each carries an extracted check); paragraphs files are chunked per paragraph with a source label and weight. Set skills_dir to null to skip skill indexing.
CLI without the MCP runtime:
RECALL_MEMORY_ROOT=/path/to/repo python recall.py "about to publish a repo"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --preflight "edit the server entrypoint"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --learn "Title" "What happened" "What to check next time"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --maintain
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --auditTuning
Env var | Default | Meaning |
|
| Root of the agent repo to index. |
|
| Where runtime lessons are stored. |
| unset | Skip the embedding import (pure BM25); used on the hook hot path. |
|
| Minimum score for a |
|
| Distinctive terms a task must share with a guardrail before it fires. |
Status and roadmap
v0.3: retrieve / preflight / learn (with dedup) / audit / safe-prune / consolidate / maintain lifecycle, plus optional semantic retrieval, an index cache, env-tunable precision, and tests. It does the thing the fact-memory tools (Mem0, Zep, Letta, Cognee) do not: procedural, relevance-gated, self-pruning memory of how to do the work, that learns from its own failures.
Ahead: auto-firing learn from failure signals; generating evals from failures; behavioural model-diffing on new model releases; and federation.
License
MIT.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ashleyevz89-hue/recall-memory-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server