engram
Persistent memory for AI agents — organized by time and space. Important memories get promoted, noise decays naturally, and related knowledge clusters into a browsable topic tree. Fully automatic.
Quick Start
Core Features
Most agent memory tools provide a vector store with search. engram adds a lifecycle — memories are not just stored, they are continuously managed.
LLM Quality Gate
New memories enter the Buffer layer. Promotion to Working or Core requires passing an LLM quality gate — the LLM evaluates each memory in context and determines whether it warrants long-term retention.
Semantic Dedup & Merge
When two memories express the same concept in different words, engram detects and merges them:
Merging is LLM-powered — based on semantic understanding, not string similarity.
Automatic Decay
Decay is epoch-based — it only occurs during active consolidation cycles, not by wall-clock time. If the agent is idle for a week, memories remain intact.
Kind | Decay rate | Use case |
| Fast | Events, experiences, time-bound context |
| Slow | Knowledge, preferences, lessons (default) |
| Slowest | Workflows, instructions, how-to |
Working and Core memories are never deleted. In the Working layer, importance decreases gradually but memories remain searchable. Buffer serves as a temporary staging area where all kinds may be evicted.
Self-Organizing Topic Tree
Vector clustering automatically groups related memories. The tree is hierarchical, with LLM-generated names:
The tree rebuilds automatically when memories change. At session start, the agent receives a topic index as a table of contents. Use POST /topic {"ids": ["kb3"]} to retrieve all memories within a specific cluster.
Triggers
Tag a memory with trigger:deploy, and it surfaces automatically when the agent queries /triggers/deploy before executing a deployment.
Architecture
Memory is organized along two dimensions — time and space:
Time — a three-layer lifecycle inspired by the Atkinson–Shiffrin memory model:
Layer | Role | Behavior |
Buffer | Short-term staging | All new memories enter here. Evicted when they fall below threshold |
Working | Active knowledge | Promoted by consolidation. Never deleted — importance decays at different rates by kind |
Core | Long-term identity | Promoted through LLM quality gate. Never deleted |
Space — a self-organizing topic tree built from embedding vectors. Related memories cluster by semantic similarity, with LLM-generated names for each cluster:
Mechanism | Description |
Vector clustering | Groups semantically similar memories into topics via cosine similarity |
Hierarchy | Related topics nest under shared parent nodes, forming a multi-level tree |
LLM naming | Generates human-readable names for each cluster automatically |
Auto-rebuild | Tree updates when memories change — no manual maintenance required |
Topic trees address a fundamental limitation of vector search: it requires the right query to find the right memory. Topic trees allow the agent to browse by subject — scan the directory, then drill into the relevant branch.
Session Recovery
A single call restores full context, intended for session start or post-compaction recovery:
Four sections, each serving a distinct purpose:
Section | Content | Budget |
Core | Full text of permanent rules and identity — never truncated | ~2k tokens |
Recent | Memories changed since last consolidation window, for short-term continuity | ~1k tokens |
Topics | Named topic index — structured directory of all memories | Leaf list |
Triggers | Pre-action safety tags for automatic lesson recall | Tag list |
The agent reads the topic index, identifies relevant topics, and drills in via POST /topic on demand. This avoids loading the entire memory store into context.
Search & Retrieval
Hybrid retrieval combining semantic embeddings and BM25 keyword search (with jieba for CJK tokenization). Results are ranked by relevance, memory importance, and recency.
Background Maintenance
Fully autonomous and activity-driven — cycles are skipped when there has been no write activity:
Consolidation (every 30 minutes)
Each cycle executes the following steps in order:
Decay — reduce importance of unaccessed memories
Dedup — detect and merge near-identical memories (cosine > 0.78)
Triage — LLM categorizes new Buffer memories for promotion
Gate — LLM evaluates promotion candidates (batched, single call)
Reconcile — LLM resolves ambiguous similar pairs; results are cached to avoid redundant calls
Topic tree rebuild — re-cluster and name new or changed topics
Topic Distillation
When a topic cluster grows too large (10+ memories), engram condenses overlapping memories into fewer, richer entries — preserving all specific details while reducing redundancy. Up to 2 topics are distilled per consolidation cycle.
Multi-Agent & Namespace Isolation
A single engram instance can serve multiple agents concurrently. SQLite WAL mode, a connection pool, and an RwLock-protected vector index make concurrent reads and writes safe out of the box.
Use the X-Namespace header to give each agent (or project) its own isolated memory space:
Install
Docker
LLM Configuration
engram functions without an LLM — providing keyword search and rule-based lifecycle only. Adding an LLM enables intelligent features (triage, gate, merge, topic naming, audit):
Two-tier model configuration — strong model for judgment tasks, lightweight model for text processing:
For AI Agents
Add this to your agent's system prompt or session:
Integration
Compatible with Claude Code, Cursor, Windsurf, OpenClaw, and any MCP-compatible tool.
17 MCP tools — see MCP docs. Full HTTP API — see Setup guide.
Web Dashboard
Built-in web UI at http://localhost:3917 for browsing memories, viewing the topic tree, monitoring LLM usage, and inspecting consolidation history.
Specs
Binary size | ~10 MB |
Memory usage | ~100 MB RSS in production |
Storage | SQLite, no external database |
Language | Rust |
Platforms | Linux, macOS, Windows (x86_64 + aarch64) |
License | MIT |
License
MIT