Skip to main content
Glama

recall-memory-mcp

A relevance-gated, self-improving procedural memory for AI agents, as an MCP server.

Most agent-memory tools remember facts (conversations, preferences). This one stores the lessons an agent learns, surfaces only the ones relevant to the task at hand, and gets better over time by learning from failures and pruning what it never uses. So the agent stops dumping its whole history into context, and it stops repeating its own mistakes.

Why

A long-running agent accretes memory and usually loads all of it every session. That is expensive, slow, and it drowns the current truth in stale history, so the agent drifts back to old, superseded decisions.

Measured on a real 64-day production agent: ~91,000 tokens were loaded every session, and ~90% of it was never used. Relevance-gating cut that to a few hundred tokens per task (about a 99% reduction), and the drift stopped, because stale history only surfaces when a task is actually about it.

Related MCP server: BuildAutomata Memory MCP Server

The full lifecycle (six tools)

  • recall(task, k) — only the lessons and state relevant to what you are about to do, each with an actionable check. Self-tracks which lessons get used.

  • learn(title, body, check) — turn a failure or insight into a retrievable lesson. Closes the loop: next time the same situation comes up, recall surfaces it.

  • memory_audit() — how much loaded memory is never used (archive candidates) and how much is stale.

  • prune() — retire learned lessons that are never retrieved (usage-based self-pruning).

  • consolidate() — flag near-duplicate lessons to merge.

  • reindex() — rebuild after the memory files change.

How it works

  • Chunks the agent's markdown memory (rules whole; state and index at paragraph level) plus its runtime-learned lessons.

  • Ranks with BM25 (length-normalised, so big stale blocks do not dominate), with source-weighting (real lessons beat index pointers) and recency (current decisions beat superseded ones).

  • learn writes structured lessons that are immediately retrievable; recall records usage; prune retires the unused.

Keyword/BM25 today (works well when lessons have distinctive vocabulary). Embeddings are on the roadmap.

Install and use

pip install mcp
RECALL_MEMORY_ROOT=/path/to/your/agent/repo python mcp_server.py   # as a stdio MCP server

By default it indexes .claude/rules/anti-paperclip.md, memory/state.md, and memory/INDEX.md under RECALL_MEMORY_ROOT, plus a learned.json it maintains. Edit the source list in recall_core.py for your layout (configurable sources are on the roadmap).

CLI without the MCP runtime:

RECALL_MEMORY_ROOT=/path/to/repo python recall.py "about to publish a repo"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --learn "Title" "What happened" "What to check next time"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --audit

Status and roadmap

v0.2: full retrieve / learn / audit / prune / consolidate lifecycle, working and verified. It does the thing the fact-memory tools (Mem0, Zep, Letta, Cognee) do not: procedural, relevance-gated, self-pruning memory of how to do the work, that learns from its own failures.

Ahead: embeddings for semantic retrieval; auto-firing learn from failure signals; generating evals from failures; behavioural model-diffing on new model releases; configurable sources and federation.

License

MIT.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ashleyevz89-hue/recall-memory-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server