Skip to main content
Glama

recall-memory-mcp

A relevance-gated, self-improving procedural memory for AI agents, as an MCP server.

Most agent-memory tools remember facts (conversations, preferences). This one stores the lessons an agent learns, surfaces only the ones relevant to the task at hand, and gets better over time by learning from failures, deduping, and pruning what it never uses. So the agent stops dumping its whole history into context, and it stops repeating its own mistakes.

Why

A long-running agent accretes memory and usually loads all of it every session. That is expensive, slow, and it drowns the current truth in stale history, so the agent drifts back to old, superseded decisions.

Measured on a real 64-day production agent: ~91,000 tokens were loaded every session, and ~90% of it was never used. Relevance-gating cut that to a few hundred tokens per task (about a 99% reduction), and the drift stopped, because stale history only surfaces when a task is actually about it.

Related MCP server: BuildAutomata Memory MCP Server

The full lifecycle (eight tools)

  • recall(task, k) -- only the lessons and state relevant to what you are about to do, each with an actionable check. Self-tracks which lessons get used.

  • preflight(task, k) -- a pre-action checklist: the specific things to verify before editing a file, sending a message, deploying, or querying a database. Built for a PreToolUse hook so the right guardrails fire automatically, with no prose to re-read.

  • learn(title, body, check) -- turn a failure or insight into a retrievable lesson. Closes the loop: next time the same situation comes up, recall surfaces it. Dedupes -- a recurring failure bumps a seen_count instead of cloning the lesson.

  • memory_audit() -- how much loaded memory is never used (archive candidates) and how much is stale.

  • prune() -- retire learned lessons safely: only those never retrieved and not recurring and older than a grace period, so fresh and recurring lessons are never lost.

  • consolidate() -- flag near-duplicate lessons to merge.

  • maintain() -- one self-maintenance pass: audit + safe prune + consolidate report. Safe to run on a schedule or at session wrap.

  • reindex() -- rebuild after the memory files change.

How it works

  • Chunks the agent's markdown memory (rules whole; state, session log, and index at paragraph level), its .claude/skills, and its runtime-learned lessons.

  • Ranks with BM25 (length-normalised, so big stale blocks do not dominate), with source-weighting (real lessons beat index pointers), recency (current decisions beat superseded ones), and a generic-term down-weight (words like "task" or "file" stop inflating noise).

  • preflight adds a concept-overlap gate: a guardrail only fires if the task shares at least two distinctive (non-generic) terms with it, so a single coincidental word never trips a false checklist. This is corpus-size independent.

  • Optional semantic/hybrid retrieval (semantic.py, model2vec static embeddings, CPU, no torch) blends cosine similarity with BM25 so a differently-worded task still finds the right lesson. It degrades gracefully to pure BM25 if the dependency is absent.

  • An on-disk index cache keyed on source-file mtimes keeps preflight fast on the hot path (it runs before every risky tool call); a stale or truncated cache simply fails validation and rebuilds, so it can never serve wrong results.

Performance

On the production agent, with preflight wired into a PreToolUse hook (fires before every file edit / deploy / risky shell command):

  • Warm hook latency ~80 ms (down from ~330 ms) -- index cache + skipping the embedding import in fast mode.

  • In-process preflight lookup ~0.5 ms.

  • Per-task context ~99% smaller than loading all memory.

Tested

A committed test suite covers retrieval precision (the right guardrail fires; benign and irrelevant actions stay silent), latency, hook robustness against malformed and hostile input, and the full auto-learn loop end-to-end (a failure is detected, distilled into a lesson, and becomes retrievable). A second, self-contained smoke test builds a tiny fake agent repo in a tempdir and exercises the whole lifecycle with no external data:

python3 tests/test_smoke.py

Install and use

pip install mcp
RECALL_MEMORY_ROOT=/path/to/your/agent/repo python mcp_server.py   # as a stdio MCP server

By default it indexes .claude/rules/anti-paperclip.md, memory/state.md, memory/session-log.md, and memory/INDEX.md under RECALL_MEMORY_ROOT, plus the .claude/skills it finds and a learned.json it maintains.

To use your own layout, drop a recall.sources.json in RECALL_MEMORY_ROOT (any key you omit falls back to the default):

{
  "rules": [{"path": "docs/rules.md", "split": "\\n###\\s+Rule\\s+"}],
  "paragraphs": [
    {"path": "docs/state.md", "label": "state", "split": "\\n##\\s+"}
  ],
  "skills_dir": ".claude/skills",
  "weights": {"rule": 1.2, "state": 1.0}
}

rules files are chunked whole per section (each carries an extracted check); paragraphs files are chunked per paragraph with a source label and weight. Set skills_dir to null to skip skill indexing.

CLI without the MCP runtime:

RECALL_MEMORY_ROOT=/path/to/repo python recall.py "about to publish a repo"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --preflight "edit the server entrypoint"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --learn "Title" "What happened" "What to check next time"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --maintain
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --audit

Tuning

Env var

Default

Meaning

RECALL_MEMORY_ROOT

.

Root of the agent repo to index.

RECALL_LEARNED_PATH

<root>/harness-memory/learned.json

Where runtime lessons are stored.

RECALL_FAST

unset

Skip the embedding import (pure BM25); used on the hook hot path.

RECALL_PREFLIGHT_FLOOR

0

Minimum score for a preflight check (the overlap gate does the real filtering).

RECALL_OVERLAP_MIN

2

Distinctive terms a task must share with a guardrail before it fires.

Status and roadmap

v0.3: retrieve / preflight / learn (with dedup) / audit / safe-prune / consolidate / maintain lifecycle, plus optional semantic retrieval, an index cache, env-tunable precision, and tests. It does the thing the fact-memory tools (Mem0, Zep, Letta, Cognee) do not: procedural, relevance-gated, self-pruning memory of how to do the work, that learns from its own failures.

Ahead: auto-firing learn from failure signals; generating evals from failures; behavioural model-diffing on new model releases; and federation.

License

MIT.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ashleyevz89-hue/recall-memory-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server