Skip to main content
Glama

a cognitive memory system that actually remembers things. built because flat markdown files don't scale and every "memory" tool i tried was either too simple (just embeddings) or too complex (needs redis + neo4j + a PhD).

engram sits in the middle. it starts as one sqlite file for local use, can graduate to postgres when you want a real concurrent service, keeps hybrid retrieval that fuses five signals, memory layers that model how brains actually work, and a neural visualization that shows the whole thing firing in real time. 98.1% R@5 on LongMemEval (ICLR 2025) — highest published score, beating MemPalace (96.6%), Emergence AI (86%), and every other memory system benchmarked.

what it does

hybrid retrieval — most memory tools just do cosine similarity and call it a day. engram runs four retrieval channels in parallel (BM25 keywords, dense embeddings via HNSW approximate nearest neighbors, entity graph BFS with 1-hop traversal, Hopfield associative pattern completion), fuses them with intent-weighted reciprocal rank fusion (k=60, weights vary by query type — why/when/who/how/what), then applies temporal + importance boosting, cross-encoder reranking, deep MLP reranking, gaussian noise for beneficial variation, and a minimum score threshold gate. the full pipeline: dense (HNSW) + BM25 + graph + Hopfield → intent-weighted RRF → boost → cross-encoder → MLP reranker → noise + threshold.

memory layers — five layers modeled after atkinson-shiffrin: working (ephemeral, auto-promotes to episodic after 30 min), episodic (events, experiences), semantic (permanent knowledge), procedural (decisions, error patterns, how-to), and codebase (compressed code knowledge — file trees, function signatures, dependency graphs). memories promote upward when they prove useful and decay if nobody accesses them. 30-day half-life on episodic, infinite on semantic.

entity graph — extracts people, tools, projects, dates from every memory. builds a relationship graph with co-occurrence strength. multi-hop traversal via recursive SQL CTEs, no neo4j needed. backlinks let you trace which memories are connected to which.

dream cycle — consolidation pass that clusters similar memories (cosine > 0.8), summarizes the clusters, generates entity "peer cards" (biographical summaries), and archives the low-value old stuff. like sleep for your memory system.

semantic dedup — finds near-duplicate memories by embedding distance (default threshold 0.92), auto-merges them keeping the higher-importance version. transfers entity links, merges tags and access counts. run manually or as part of consolidation.

codebase scanning — point scan_codebase at a project directory and it extracts file trees, function/class signatures, import graphs, and config files into compressed codebase-layer memories. stores ~10x fewer tokens than raw code while keeping what you actually need to work with the project.

conversation ingest — auto-extracts memories from Claude Code JSONL session logs. parses exchanges into Q+A pairs, classifies them (decisions, corrections, errors, task completions), and stores them in the right layer with appropriate importance scores.

session handoffs — maintains a structured resumable handoff snapshot for the active MCP session. as diary entries and memory writes happen, engram refreshes the current handoff automatically. new agent sessions can call resume_context to pick up the latest decisions, open loops, recent work, and search history without reconstructing everything from raw logs.

neural visualization — force-directed graph of entities organized in concentric rings by memory layer. neurons fire with traveling impulse particles when memories get accessed. polls the database so it works across processes. fire a query from the CLI or MCP server and watch the web UI light up.

drift detection — memories reference file paths, function names, commands, and dependencies. those references go stale when the codebase changes. drift_check extracts verifiable claims from memory content and validates them against the actual filesystem — dead paths, missing functions, broken npm scripts. returns a drift score (0-100) with per-issue breakdown. zero AI cost, pure filesystem checks. drift_fix auto-invalidates dead references and flags stale memories. inspired by mex's claim verification approach.

pattern extraction — after a session, extract_patterns analyzes recent activity (diary entries, new memories, events) and distills reusable procedural knowledge. classifies work into categories (workflow, gotcha, decision, integration, debug), checks novelty against existing procedural memories via embedding distance, and only stores patterns that are genuinely new. the GROW step from mex, automated.

negative knowledgeremember_negative stores explicit "what does NOT exist" claims: no caching layer, no Redux, the /admin endpoint was removed. these prevent future hallucinated recommendations. stored in the semantic layer with a NEGATIVE KNOWLEDGE prefix so they surface when you search for the thing that doesn't exist.

enriched embeddings — at write time, an LLM generates keywords, categorical tags, and a contextual summary for each memory. the embedding is computed over the concatenation of content + keywords + tags + summary, giving the vector richer semantic signal than raw content alone. inspired by A-Mem's zettelkasten approach, where enriched embeddings nearly doubled multi-hop retrieval F1.

memory evolution — memories aren't write-once. when a new memory arrives and near-neighbors are detected (via the surprise gate), the system asks an LLM whether existing memories should be updated with the new context. old memories get smarter over time instead of going stale. from A-Mem — removing evolution dropped multi-hop F1 from 45.85% to 31.24% in ablation.

intent-aware retrieval — queries are classified by intent (why/when/who/how/what) and retrieval signals are dynamically weighted. "why" queries boost graph traversal for causal reasoning. "when" queries boost BM25 for date matching. "who" queries boost entity graph lookup. from MAGMA's adaptive policy (+9% over static weighting).

trust-weighted decay — different sources decay at different rates. human-authored memories get full 30-day half-life. auto-extracted observations decay 3x faster. formula: λ_eff = λ · (1 + κ·(1 - trust)), κ=2.0. from SuperLocalMemory V3.3. also: confirmation count — memories corroborated by multiple independent sources get importance boost.

write-path CRUD — instead of always appending then deduplicating later, new memories are classified at write time as ADD/UPDATE/NOOP by comparing against existing neighbors. updates merge content in-place. noops skip storage entirely. from Mem0's production pipeline.

adversarial belief probing — during the dream cycle, randomly sample old semantic/procedural memories and challenge them: "is this still true?" beliefs that fail the probe get importance reduced. prevents fossilized false beliefs. from the March 2026 survey on autonomous agent memory.

memory types — memories are now typed: fact (structured knowledge, statuses, states), procedure (how-to, playbooks, rules), narrative (session logs, raw context). types are indexed and filterable. the recall tool accepts a mode parameter: facts_only (just structured knowledge), facts_plus_rules (+ procedures), full_context (everything). no more narrative dumps when you just want a status. existing memories auto-backfill from metadata on migration.

status tracking — memories have lifecycle states: active, challenged, invalidated, merged, superseded. every transition is recorded in a status_history audit table with timestamp and reason. non-active memories are filtered from retrieval results. use update_status to transition, status_history to audit. designed for the "which status is current?" problem — one canonical state per memory, no conflicting duplicates.

83 MCP tools — plugs into claude code (or any MCP client) as a tool server. docker-ready. recall, remember, entity lookup, codebase scanning, conversation extraction, semantic dedup, drift detection, pattern extraction, negative knowledge, quality metrics, embedding compression, community detection, timeline queries, similarity search, backlinks, briefing, query comparison, hotspot surfacing, consolidation, batch operations, export, health checks, the works.

Related MCP server: agentbay-mcp

the retrieval pipeline

eight stages — four parallel channels, intent-weighted fusion, boosted, reranked, gated:

query
  │
  ├── intent classification (why/when/who/how/what)
  │         → dynamic signal weights per intent type
  │
  ├── dense HNSW search (bge-small-en-v1.5, 384-dim, hnswlib)  → top 3k candidates
  ├── BM25 via sqlite FTS5 (content + hypothetical queries)     → top 3k candidates
  ├── entity graph BFS (1-hop traversal, strength-weighted)     → top k candidates
  └── Hopfield associative (pattern completion, β=8.0)          → top k candidates
           │
           ▼
     intent-weighted reciprocal rank fusion (k=60)
     score = Σ w_intent · 1/(60 + rank) across 4 channels
           │
           ▼
     temporal + importance boosting
     retention regularization, access frequency, date matching
           │
           ▼
     cross-encoder reranking (ms-marco-MiniLM-L-6-v2)
     joint (query, document) scoring — optional, adds ~200ms
           │
           ▼
     deep MLP reranker (optional, if trained)
     learned relevance from historical access patterns, <1ms
           │
           ▼
     gaussian noise (ACT-R, σ=0.02) + threshold gate
     beneficial variation + minimum score cutoff
           │
           ▼
     final top-k results

deep retrieval — optional 7th stage: a learned 2-layer MLP reranker trained on actual access patterns. which memories get accessed after being returned in search results? that signal teaches the reranker what's useful vs what's just semantically similar. takes 10 features (cosine similarity, importance, access count, age, layer one-hot, retention score) and outputs a relevance prediction. train with train_reranker, model persists to disk next to the database. runs automatically on every recall once trained. lightweight — adds <1ms per query. after the MLP, a small gaussian noise term (σ=0.02, ACT-R inspired) provides beneficial retrieval variation, and a configurable minimum score threshold gates out garbage results.

task-aware skill selectionget_skills decides whether to inject procedural knowledge and which 2-3 items to surface. three-stage gate: (1) need assessment via query surprise + domain coverage, (2) selection of top procedural memories by adaptive relevance threshold, (3) calibration with confidence scoring that filters borderline matches. based on SkillsBench finding that focused skills (+16.2pp) beat comprehensive docs (-2.9pp), and the AGENTS.md evaluation showing static context files reduce performance. the system knows when to inject (unfamiliar domain + relevant procedures = high confidence) and when to shut up (model already knows + no specific procedures = skip).

the hypothetical query part is from docTTTTTquery — at ingestion time, generate questions each memory might answer, index them alongside the content. fixes the vocabulary mismatch problem where your search terms don't match the stored text.

memory lifecycle

memories aren't static. they move between layers based on how useful they turn out to be.

surprise-based importance — at write time, every new memory is compared against existing embeddings using k-NN cosine distance (k=5). novel memories (far from anything stored) get their importance boosted up to +0.3. redundant memories (close to existing) get importance reduced and are flagged as potential duplicates. the surprise score is stored in metadata so you can audit it later. this is inspired by the Titans paper (Behrouz et al., Google) where memory updates are proportional to surprise — the gradient of the loss function. the remember tool now returns a surprise field (0-1) and warns when near-duplicates are detected.

importance scoring uses 9 factors:

  • base importance (set at creation, 0.0-1.0, adjusted by surprise at write time)

  • access frequency (log scale, how often it's been recalled)

  • recency (exponential decay, trust-weighted half-life)

  • emotional valence (strong emotions = more memorable)

  • stability (accessed consistently over time vs burst)

  • layer boost (semantic memories weighted higher)

  • source trust (human=1.0, AI=0.7, interaction=0.6, ingest=0.5, dream=0.4)

  • confirmation count (independently corroborated facts get boosted)

  • combined into a weighted composite score

promotion — episodic memories that hit importance >= 0.7 and access count >= 5 get promoted to semantic (permanent). working memories auto-promote to episodic after 30 minutes or 2 accesses. the sweep runs on every recall call so it's basically free.

pinning — pin any memory with the pin tool or the pin button in the web UI. pinned memories are immune to the dream cycle's forgetting pass. useful for memories that are important but accessed infrequently — the kind ebbinghaus would normally archive.

retention regularization — forgetting is reframed as retention regularization, inspired by Miras (Behrouz et al., Google). three modes, configurable via retention_mode in config:

  • l2 (classic ebbinghaus): smooth exponential decay, 50% at half-life. everything fades gradually.

  • huber (default): matches L2 near-term, transitions to linear for old memories. robust to burst-then-quiet access patterns — old-but-once-hot memories get a gentler transition instead of an infinite long tail. huber_delta controls the transition point.

  • elastic (L1+L2): sparse retention. strongly-held memories stay near full strength, weakly-held ones decay faster. produces cleaner separation between keepers and forgettables. elastic_l1_ratio controls the L1/L2 blend.

all modes include access reinforcement — each recall strengthens retention (spaced repetition effect, log-scaled, capped at +0.3). trust-weighted: low-trust sources (auto-extracted, dream-generated) decay up to 3x faster than human-authored memories (λ_eff = λ · (1 + κ·(1 - trust)), κ=2.0, from SuperLocalMemory V3.3). after 90 days, if retention < 0.15, importance < 0.3, and access count < 3, the memory gets soft-deleted. semantic, procedural, and pinned memories don't decay.

embedding compression — as memories age and retention drops, their embeddings can be quantized to save storage: active (R>0.8) = 32-bit float, warm = 8-bit (3.9x compression, 0.9999 cosine fidelity), cold = 4-bit (7.6x, 0.97 fidelity), archive = 2-bit (14.6x, 0.59 fidelity). uses Fisher-Rao Quantization-Aware Distance (FRQAD) for mixed-precision comparison — inflates variance proportional to quantization loss to prevent false similarity. run with compress_embeddings.

consolidation (dream cycle) — 7-step pipeline: (1) apply forgetting curve with trust-weighted retention, (2) cluster similar memories by embedding distance and merge clusters of 5+, (3) generate peer cards for entities with enough data, (4) cross-domain synthesis — find entity pairs in different contexts with moderate embedding similarity (0.75-0.90), LLM-confirm genuine connections, create SYNTHESIZED_WITH bridges, (5) adversarial belief probing — randomly sample old semantic/procedural memories and challenge them ("is this still true?"), reduce importance on invalidated beliefs, (6) drift detection — validate memory claims against filesystem, auto-invalidate dead references, (7) prune old access logs and events. run manually with engram consolidate or the MCP consolidate tool.

entity graph

every memory gets scanned for entities — people, tools, projects, dates, URLs, file paths. these go into an entity registry with canonical names, aliases, and types.

relationships form automatically through co-occurrence (entities mentioned in the same sentence get a CO_OCCURS link) and through pattern matching ("X uses Y" → USES, "X built Y" → CREATED, etc.). relationship strength increases with evidence count.

traversal uses recursive SQL CTEs for multi-hop queries — "show me everything connected to Ari within 2 hops" runs in a single SQL query, no graph database needed. the recall_related tool does this.

you can also manually link entities (link_memories), merge duplicates (merge_entities), add aliases (update_entity), find backlinks (backlinks), and fuzzy-search for entities by partial name (search_entities).

editing and annotating

memories aren't write-once. you can:

  • edit contentedit_memory changes the text and automatically re-embeds and rebuilds the FTS index. the memory keeps its ID, access history, and entity links.

  • annotateannotate attaches timestamped notes to a memory without touching its content. useful for adding context later ("this turned out to be wrong" or "confirmed by Ari on april 8").

  • invalidateinvalidate marks a fact as no longer true with a reason. the memory stays in the database (useful for audit) but gets flagged and shown with a strikethrough in the web UI.

  • tagtag adds or removes freeform tags. batch_tag applies tags to all memories matching a search query.

examples

the examples/ directory has ready-to-use setup guides:

setup guides:

file

what it covers

claude-code-setup.md

full walkthrough: install, wire into claude code, add CLAUDE.md instructions, seed memories

hooks-setup.md

auto-extract memories from conversations via claude code hooks

agent-patterns.md

common patterns: session orientation, learning from corrections, check-before-store, cognitive scaffolding, multi-agent setup

skills:

file

what it covers

session-continuity/SKILL.md

agent skill for loading resume_context, writing important state during work, and leaving a structured session_handoff behind

python examples:

file

what it does

python-client.py

standalone usage without MCP — store, search, surprise scoring, reranker training

custom-agent.py

conversational agent with engram memory using the Anthropic SDK

openai-compatible.py

same pattern for any OpenAI-compatible API (OpenAI, Ollama, vLLM, llama.cpp)

multi-agent.py

3 specialized agents sharing one database — cross-domain recall, surprise, dream cycle

api-embeddings.py

switch between local and cloud embedding backends (Voyage, OpenAI, Gemini)

entity-graph.py

build and traverse the entity relationship graph — multi-hop traversal

negative-knowledge.py

store "what does NOT exist" — prevents hallucinated recommendations

drift-detection.py

detect and fix stale memories referencing dead paths or missing functions

export-import.py

export memories to portable JSON, import into fresh database

codebase-scan.py

scan a project and extract compressed code knowledge

install

git clone https://github.com/raya-ac/engram.git
cd engram
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

or from PyPI:

pip install engram-memory-system

needs python 3.11+. first run will download two small models (~100MB total):

  • BAAI/bge-small-en-v1.5 (33MB) — embeddings

  • cross-encoder/ms-marco-MiniLM-L-6-v2 (22MB) — reranking

storage backends

engram now supports both:

  • sqlite for local single-user installs and quick starts

  • postgres for always-on web + MCP deployments where multiple clients are hitting the same memory store

default config still uses sqlite:

storage_backend: sqlite
db_path: ~/.local/share/engram/memory.db
postgres_dsn: ""

to migrate an existing local install to postgres:

engram migrate-postgres \
  --dsn postgresql://user:pass@localhost:5432/engram \
  --switch-config

that command:

  • copies your current sqlite database into postgres

  • verifies the migrated counts before switching

  • backs up config.yaml

  • flips storage_backend + postgres_dsn for the next restart

the original sqlite database is left untouched as a rollback path.

optional: API backends

LLM backends — for fact extraction, memory evolution, enrichment, and consolidation:

pip install -e ".[anthropic]"  # anthropic API (claude models)
pip install -e ".[openai]"     # openai API (gpt models)
pip install -e ".[api]"        # all API backends (embedding + LLM)

configure in config.yaml:

llm:
  backend: anthropic             # claude_cli | anthropic | openai | mlx
  model: claude-haiku-4-5-20251001
  api_key: ""                    # or set ANTHROPIC_API_KEY / OPENAI_API_KEY env var

backend

auth

notes

claude_cli

Claude Code login

default, uses claude CLI subprocess

anthropic

ANTHROPIC_API_KEY or llm.api_key

direct API, any Claude model

openai

OPENAI_API_KEY or llm.api_key

any OpenAI/compatible model

mlx

local

runs Qwen/Llama/etc on Apple Silicon GPU

embedding backends — use cloud embedding APIs instead of (or alongside) local models:

pip install -e ".[voyage]"   # voyage-3.5, voyage-3.5-lite, voyage-code-3
pip install -e ".[openai]"   # text-embedding-3-small, text-embedding-3-large
pip install -e ".[gemini]"   # gemini-embedding-001
pip install -e ".[api]"      # all API backends

set the API key and model in config.yaml or env vars:

export VOYAGE_API_KEY="your-key"    # get at https://dash.voyageai.com/
export OPENAI_API_KEY="your-key"
export GEMINI_API_KEY="your-key"
# config.yaml
embedding_model: voyage-3.5         # auto-detects backend from model name
embedding_dim: 1024                 # auto-detected if model is known

engram auto-detects the backend from the model name — voyage-* uses the Voyage API, text-embedding-* uses OpenAI, gemini-* uses Gemini. or set embedding_backend explicitly.

supported models:

model

provider

dim

price/1M tokens

notes

BAAI/bge-small-en-v1.5

local

384

free

default, runs on CPU or Apple GPU

voyage-3.5

Voyage AI

1024

$0.18

best retrieval quality, Anthropic recommended

voyage-3.5-lite

Voyage AI

1024

$0.02

94% of 3.5 quality, budget option

voyage-code-3

Voyage AI

1024

$0.18

optimized for code

text-embedding-3-small

OpenAI

1536

$0.02

cheapest API option

text-embedding-3-large

OpenAI

3072

$0.13

highest dim

gemini-embedding-001

Google

768

free tier

top MTEB retrieval score

switching models requires re-embedding all memories — use engram reembed after changing the model.

docker

docker compose up -d
# → http://localhost:8420

mount your config and set API keys via environment variables. data persists in a docker volume.

quick start

ingest some files

engram ingest ~/notes/
engram ingest ~/projects/docs/ ~/journal/

supports markdown, plaintext, JSON (claude code JSONL, claude.ai JSON, chatgpt JSON tree, slack exports), PDF. extracts atomic facts via LLM, embeds them, indexes in FTS5, extracts entities and relationships.

engram search "what happened on march 28"
engram search "melee garden architecture" --debug  # shows retrieval stage breakdown
engram search "apple sandbox bypass" --rerank      # enables cross-encoder (slower, better)

remember something directly

engram remember "Ari prefers casual tone, swearing when it fits"
engram remember "deploy command: npm run build && rsync" --layer procedural

manage ANN index

engram index rebuild     # full rebuild from all embeddings
engram index status      # check index size, vector count, last built

check status

engram status

entity lookup

engram entity Ari --graph

check memory drift

engram drift                                    # full drift report
engram drift --search-roots ~/project/src       # also verify function names
engram drift --fix --dry-run                    # preview what would be fixed
engram drift --fix                              # auto-invalidate dead refs, flag stale

extract patterns from session

engram patterns                                 # extract from last 4 hours
engram patterns --hours 24 --dry-run            # preview from last 24 hours
engram patterns --threshold 0.5                 # only store highly novel patterns

re-embed after switching models

engram reembed                              # re-embed all memories with current model
engram reembed --dry-run                    # preview count without re-embedding
engram reembed --batch-size 128             # larger batches for API models

watch a directory for auto-ingest

engram watch ~/notes/                       # poll every 30s for new/changed files
engram watch ~/chats/ --interval 60         # poll every 60s

export and import

engram export backup.json                   # export memories + entities + relationships
engram export backup.json --include-embeddings  # include embedding vectors
engram export backup.jsonl --layer procedural   # filter by layer
engram import backup.json                   # restore from export
engram import backup.json --skip-duplicates # skip memories with matching content hash

run the dream cycle

engram consolidate

run tests

pytest tests/ -v                            # 72 tests, ~3s

start the web dashboard

engram serve --web
# → http://127.0.0.1:8420

start the MCP server

engram serve --mcp

MCP server

wire it into claude code by adding to ~/.claude/settings.json:

{
  "mcpServers": {
    "engram": {
      "command": "/path/to/engram/.venv/bin/python",
      "args": ["-m", "engram", "serve", "--mcp"]
    }
  }
}

restart claude code. you get 66 tools:

recall & search

tool

what it does

recall

hybrid search across all layers. accepts mode: facts_only, facts_plus_rules, full_context

recall_entity

everything about a person/project/tool — memories, relationships, timeline

recall_timeline

memories in a date range

recall_related

multi-hop graph traversal from an entity

recall_recent

last N memories by creation time

recall_layer

search within a specific layer

recall_hints

search memories but return only hints (truncated snippets + entity names) to trigger recognition without replacing cognition

get_skills

task-aware skill selection — get focused procedural guidance only when injection would help, skip when it wouldn't

recall_code

search the codebase layer for functions, classes, files

recall_context

search and return a formatted context block with token budget

recall_by_type

get memories filtered by semantic type — fact, procedure, narrative

find_similar

find memories most similar to a given one by embedding distance

compress

summarize search results down to a token budget

store & organize

tool

what it does

remember

store a memory with layer and importance

remember_decision

decision + rationale → procedural layer

remember_error

error pattern + prevention → procedural layer

remember_interaction

Q+A pair → episodic layer

remember_project

structured project info → semantic layer

remember_negative

store explicit negative knowledge — what does NOT exist, what should NOT be done

edit_memory

edit content of an existing memory (re-embeds automatically)

annotate

add a note to a memory without changing its content

pin / unpin

pin a memory so it never gets forgotten by the dream cycle

forget

soft-delete a memory

invalidate

mark a fact as no longer true

update_status

transition a memory's lifecycle status with audit trail

status_history

get the full status transition history for a memory

tag

add or remove tags on a memory

bulk_forget

mass cleanup by source file, layer, or date

entities & graph

tool

what it does

update_entity

add aliases, change type

merge_entities

combine two entities that are the same thing

search_entities

fuzzy search for entities by partial name

entity_graph

relationship subgraph as JSON

entity_timeline

entity's memories ordered chronologically

link_memories

manually relate two memories via their entities

backlinks

find all memories linked to a specific memory via shared entities

codebase

tool

what it does

scan_codebase

extract compressed code knowledge from a project directory

recall_code

search the codebase layer specifically

list_projects

show all scanned projects with memory counts

drift detection

tool

what it does

drift_check

verify memories against filesystem reality — dead paths, missing functions, stale memories. returns drift score 0-100

drift_fix

auto-fix drift issues — invalidate dead refs, flag stale memories. use dry_run=true first

extract_patterns

extract reusable procedural patterns from recent session activity — only stores what's genuinely novel

dedup & maintenance

tool

what it does

dedup

find and merge near-duplicate memories by embedding similarity

find_duplicates

preview duplicate pairs without merging

recompute_importance

recalculate all importance scores with the 9-factor formula

batch_tag

add tags to all memories matching a search query

train_reranker

train the deep MLP reranker on access patterns

reranker_status

check if the deep reranker is trained

compress_embeddings

lifecycle-aware quantization (32/8/4/2-bit) with FRQAD distance metric

detect_communities

label propagation over entity graph, optional LLM summaries

quality_metrics

storage quality ratio, curation ratio, enrichment coverage

conversations & sessions

tool

what it does

ingest_sessions

auto-extract memories from recent Claude Code conversation logs

session_summary

generate summary from diary entries + recent events

session_handoff

build and optionally persist a structured handoff packet for the active session

resume_context

load the latest saved handoff packet so a new session can resume quickly

lifecycle & system

tool

what it does

consolidate

run dream cycle (cluster, summarize, peer cards, forget)

promote / demote

move memories between layers

layers

graduated L0-L3 context for prompt injection

status

memory counts, entity counts, db size

health

embedding cache, FTS index, orphaned entities, stale working memories

access_patterns

most-recalled memories, hit rates

count_by

group counts by layer, source type, entity, or month

export

dump memories as markdown or JSON

ingest

import files or directories

explain_importance

break down a memory's importance score into 7 component factors

memory_map

high-level map of the whole system — layer counts, top entities per layer, date range, recent activity

diary_write / diary_read

session notes

benchmarks

43/43 tests across 20 subsystems. run on 446 embedded memories, Apple Silicon.

full test suite (446 vectors, 384-dim, Apple Silicon)

subsystem

tests

result

embedding

3/3

dim=384, norm=1.0, avg 5.1ms, batch OK

ANN index (HNSW)

7/7

0.09ms search, 100% recall@10, 5,304 inserts/sec

brute-force dense

2/2

0.016ms avg (100 runs)

intent classification

1/1

6/6 correct (why/when/who/how/what)

full pipeline (no rerank)

3/3

15.5ms avg, debug mode OK

full pipeline (+ cross-encoder)

1/1

252ms avg

cross-encoder

2/2

correct ranking, 2.9ms/doc

surprise gate

4/4

0.10ms avg, novel=0.85, duplicate=0.44

Hopfield channel

1/1

<1ms

BM25 / FTS5

2/2

3.5ms avg

entity graph

4/4

find, relationships, 2-hop traversal (161 related)

memory CRUD

2/2

write → read → ANN verify → forget

layers (L0-L3)

1/1

248ms, 4 layers

deep reranker

1/1

trained=True

importance scoring

1/1

9-factor composite OK

store internals

3/3

cache cold=0.4ms hot=0.001ms

diary

1/1

write + read OK

events

1/1

logging OK

index I/O

1/1

967 KB on disk, save=4ms load=10ms

config

2/2

ANN config + reload consistent

throughput (Apple Silicon, MLX GPU)

operation

rate

embedding (MLX GPU)

1,879 texts/sec

embedding (CPU)

176 texts/sec

sqlite bulk insert

51,000 rows/sec

ANN insert

5,304 ops/sec

embed + store 100k

~3 min

latency (Apple Silicon)

operation

time

ANN dense search

0.09ms avg

brute-force dense search

0.016ms avg

full pipeline (no rerank)

15.5ms avg

full pipeline (+ cross-encoder)

252ms avg

surprise gate (k-NN)

0.10ms avg

embedding

5.1ms avg

cross-encoder rerank

2.9ms/doc

BM25 / FTS5

3.5ms avg

Hopfield channel

<1ms

ANN index save

4ms

ANN index load

10ms

embedding cache (cold)

0.4ms

embedding cache (hot)

0.001ms

ANN scaling projection

vectors

brute-force

ANN (HNSW)

speedup

1k

0.1ms

0.12ms

1x

10k

0.9ms

0.16ms

5x

100k

8.7ms

0.20ms

45x

500k

43.7ms

0.22ms

198x

1M

87.3ms

0.23ms

377x

recall@10 accuracy: 100% (20/20 queries, ANN vs brute-force exact match)

LongMemEval (ICLR 2025)

LongMemEval — 500 questions testing 5 long-term memory abilities (information extraction, multi-session reasoning, knowledge updates, temporal reasoning, abstention) across ~40 conversation sessions per question (~115k tokens). the standard benchmark for chat assistant memory.

engram uses HNSW + BM25 + RRF fusion against per-question session haystacks. no entity graph or Hopfield (those need persistent memory, not ephemeral per-question corpora). run with benchmarks/longmemeval/run_engram.py.

system

R@5

method

engram v2

98.1%

HNSW + BM25 + assistant BM25 + temporal boost + cross-encoder

MemPalace (raw)

96.6%

ChromaDB cosine, verbatim storage

engram v1

94.7%

HNSW + BM25 + RRF

Emergence AI

86.0%

RAG

MemPalace (AAAK)

84.2%

compressed storage

EverMemOS

83.0%

TiMem

76.9%

temporal hierarchical

per question type (470 non-abstention questions):

type

n

R@5

R@10

knowledge-update

72

100.0%

100.0%

single-session-user

64

100.0%

100.0%

multi-session

121

99.2%

99.2%

temporal-reasoning

127

96.9%

97.6%

single-session-assistant

56

96.4%

96.4%

single-session-preference

30

93.3%

96.7%

v2 adds three channels over v1: assistant-turn BM25 (weight 0.5), timestamp proximity boost, and cross-encoder reranking on top-20 candidates. the assistant channel catches answers in assistant responses without polluting the dense index. the temporal boost favors sessions closer to the question date. the cross-encoder rescores the top candidates jointly against the query.

retrieval quality (synthetic)

tested on synthetic memories with template-varied content (different topics, people, tools). queries use the first line of each memory verbatim — a strict exact-match test.

metric

500 memories

10k memories

100k memories

recall@1

10%

25%

0%

recall@5

55%

75%

20%

recall@10

95%

95%

40%

coverage (top 20)

100%

100%

60%

recall drops at 100k because all synthetic memories use similar templates — finding one exact match among 100k near-duplicates is adversarially hard. real-world diverse content scores much higher.

intent classification

accuracy

90% (9/10 test cases)

query intent (why/when/who/how/what) is classified and used to dynamically weight retrieval signals. "why" boosts graph edges, "when" boosts BM25 date matching, "who" boosts entity lookup.

system health metrics

metric

description

storage quality

fraction of stored memories ever recalled

curation ratio

memories with updates/invalidations vs total

enrichment ratio

memories with keywords+tags+summary metadata

evolution count

memories updated by evolution on neighbor write

confirmation count

memories independently corroborated

run quality_metrics via MCP or the web dashboard health panel.

hooks

engram ships with a shell hook for Claude Code that auto-extracts memories from your conversation sessions.

# hooks/save_hook.sh — run periodically or on session end
ENGRAM_VENV=~/path/to/engram/.venv ./hooks/save_hook.sh

the hook finds recent Claude Code JSONL files, parses exchanges into Q+A pairs, classifies them (decisions get stored in procedural, corrections become error patterns, etc.), and stores them with appropriate importance. it skips files it's already ingested via content hash.

inside the MCP server, engram now also refreshes a structured session handoff automatically as diary entries and memory writes happen. if a new agent session needs orientation, call resume_context to get the latest handoff packet instead of re-reading the whole diary.

you can also wire it into Claude Code's hook system by adding to your settings — check hooks/save_hook.sh for details.

web dashboard

full monitoring UI at http://127.0.0.1:8420. supports optional bearer token auth — set web.auth_token in config.yaml to lock it down.

  • neural map — force-directed entity graph with concentric layer rings (semantic core → procedural → episodic → working). neurons glow and fire impulse particles along synapses when memories are accessed. drag nodes, hover for details, click to inspect. polls the database every 2s so MCP queries show up in real time.

  • search — hybrid search with debug mode showing all 5 retrieval stages. filter chips for layer, importance slider. hint mode toggle returns truncated snippets with reveal buttons for cognitive scaffolding. search history saved to localStorage with dropdown.

  • memories — browse all memories, filter by layer (including codebase). layer-colored left borders, importance bars, slide-in animations. every card has inline actions: edit content, promote/demote, pin/unpin, find similar, explain importance, copy, invalidate, forget. select mode for bulk operations (promote/forget multiple at once). pinned memories show gold glow.

  • entities — entity chips with memory counts. click to open inspector with relationship graph, add aliases, change entity type.

  • timeline — date range queries with memory cards.

  • remember — tabbed forms: general (any layer/importance), decision (with rationale → procedural), error pattern (with prevention → procedural), Q+A interaction (→ episodic). now shows surprise score and adjusted importance after storing.

  • cognition — three tabs for the new memory science features:

    • surprise: paste text to preview novelty score before storing. radial gauge visualization (green=novel, red=duplicate), k-NN distance bars, nearest memory snippet.

    • retention: interactive canvas chart overlaying L2, Huber, and elastic net decay curves. sliders for half-life, huber delta, and L1 ratio — curves redraw client-side in real time.

    • reranker: deep MLP reranker status card, train button with epoch/LR inputs, training results display.

  • bridges — cross-domain synthesis viewer. shows entity pairs connected by the dream cycle's LLM-confirmed bridges, with similarity scores and connection descriptions.

  • analytics — donut chart for layer distribution, bar charts for most recalled memories, top entities by memory count, source type breakdown.

  • context — L0-L3 graduated context viewer with token counts per layer and copy buttons. query input for L3 search-based context.

  • health — system health dashboard with 10 status cards (embedding cache, orphaned entities, stale working memories, FTS index, embedding coverage, db size, etc). plus a memory map showing top entities per layer and full date range.

  • dedup — duplicate detection with adjustable similarity threshold slider. scan to preview duplicate pairs side by side, one-click auto-merge.

  • ingest — file/directory path ingestion with real backend processing, session ingest button, and recent ingestion log.

  • export — download memories as markdown or JSON from sidebar, with optional layer filter.

  • live events — real-time feed of all memory reads/writes across all processes (MCP, CLI, web). deduplicates events within 2-second windows and shows result counts.

  • session diary — quick note-taking input in the sidebar, timestamped entries.

  • inspector panel — right sidebar that shows memory details, entity graphs, similar memories (with similarity percentages), importance factor breakdowns (colored bar chart with 9 weighted factors), annotations with add-note input, and access history.

  • toast notifications — bottom-right toasts for all actions (promote, pin, copy, forget, dedup) with success/error/info styling and auto-dismiss.

  • keyboard shortcuts/ focus search, n neural map, s search, r remember, a analytics, c cognition, b bridges, Esc close inspector.

web API

the dashboard is backed by a full JSON API you can hit directly:

GET  /api/memories                    paginated list, optional ?layer= filter
GET  /api/memories/:id                full memory with hypothetical queries, entities, access history
GET  /api/memories/:id/similar        find similar memories by embedding distance
GET  /api/memories/:id/importance     9-factor importance score breakdown
GET  /api/search?q=...&debug=true     hybrid search with optional debug breakdown
GET  /api/search/filtered?q=...       search with layer, importance, date, source filters
GET  /api/entities                    all entities with memory counts
GET  /api/entities/:id/graph          entity relationship subgraph
GET  /api/entities/:id/timeline       entity memories ordered chronologically
GET  /api/neural                      full graph for neural visualization
GET  /api/neural/fires?since=...      recent access events (lightweight polling)
GET  /api/timeline?start=...&end=...  temporal query
GET  /api/analytics                   layer distribution, top accessed, top entities
GET  /api/health                      system health (cache, orphans, FTS, embeddings)
GET  /api/memory-map                  full system overview with per-layer top entities
GET  /api/context?query=...           L0-L3 graduated context with token counts
GET  /api/duplicates?threshold=0.92   preview near-duplicate memory pairs
GET  /api/stats                       system statistics
GET  /api/events                      recent events from all processes
GET  /api/diary                       session diary entries
GET  /api/ingest/log                  recent file ingestions
GET  /api/pulse                       hourly activity counters + sparkline
GET  /api/heatmap?days=30             github-style activity heatmap
GET  /api/memories/:id/history        importance score over time
GET  /api/retention/curves            L2/Huber/elastic curve data for chart
GET  /api/retention/scatter           real memory age vs retention scatter data
GET  /api/reranker/status             deep reranker training state
GET  /api/bridges                     cross-domain bridge memories
GET  /api/search/hints?q=...          truncated hints for cognitive scaffolding
GET  /api/skills?query=...            task-aware skill selection with confidence scoring
GET  /api/export?format=json          export memories as markdown or JSON
POST /api/remember                    store a memory (with surprise scoring)
POST /api/consolidate                 trigger dream cycle
POST /api/dedup                       auto-merge duplicate memories
POST /api/surprise/preview            compute surprise for text before storing
POST /api/reranker/train              trigger reranker training
POST /api/memories/:id/promote        change memory layer
POST /api/memories/:id/demote         demote to lower layer
POST /api/memories/:id/edit           edit content (re-embeds automatically)
POST /api/memories/:id/annotate       add timestamped note
POST /api/memories/:id/invalidate     mark as no longer true
POST /api/memories/:id/forget         soft-delete
POST /api/memories/:id/pin            pin (prevent forgetting)
POST /api/memories/:id/unpin          unpin
POST /api/memories/bulk               bulk promote/forget/tag/demote
POST /api/entities/:id/alias          add entity alias
POST /api/entities/:id/type           change entity type
POST /api/diary                       append diary entry
POST /api/ingest/path                 ingest a file or directory
POST /api/ingest/sessions             ingest recent Claude Code sessions

architecture

local installs can live in one sqlite file (~/.local/share/engram/memory.db). if you run engram as a concurrent web + MCP service, you can switch the primary store to postgres.

engram/
├── store.py          # sqlite schema, CRUD, FTS5, entity graph (recursive CTEs), ANN lifecycle
├── ann_index.py      # HNSW approximate nearest neighbor index (hnswlib wrapper)
├── embeddings.py     # multi-backend embeddings (mlx, sentence-transformers, voyage, openai, gemini) + cross-encoder
├── retrieval.py      # 5-stage hybrid pipeline (HNSW dense + BM25 + graph → RRF → boost → rerank → deep)
├── extractor.py      # LLM fact extraction + hypothetical query generation
├── entities.py       # regex entity extraction, relationship graph, co-occurrence
├── surprise.py       # k-NN novelty scoring at write time (Titans-inspired surprise gate, ANN-accelerated)
├── deep_retrieval.py # learned MLP reranker trained on access patterns
├── skill_select.py   # task-aware skill selection gate (SkillsBench-inspired)
├── lifecycle.py      # retention regularization (L2/Huber/elastic), 9-factor importance, promotion
├── consolidator.py   # dream cycle (clustering, summarization, peer cards, archival, belief probing)
├── codebase.py       # project scanner — file trees, signatures, deps → codebase layer
├── conversations.py  # claude code session ingest — exchange pairs, classification
├── dedup.py          # semantic deduplication — find and merge near-duplicates
├── layers.py         # L0-L3 graduated context retrieval
├── compress.py       # token-budget compression with entity codes
├── formats.py        # parsers for markdown, JSON chat exports, PDF, slack, email
├── llm.py            # claude CLI + mlx backend abstraction
├── evolution.py      # memory enrichment, evolution, CRUD classification, trust scoring, canonicalization
├── drift.py          # memory drift detection — claim extraction, filesystem verification
├── patterns.py       # session pattern extraction — distill procedural knowledge from work
├── quantize.py       # lifecycle embedding compression (32/8/4/2-bit) with FRQAD
├── communities.py    # label propagation community detection + LLM summaries
├── hopfield.py       # Hopfield associative retrieval — pattern completion via modern Hopfield network
├── benchmark.py      # self-benchmark suite — retrieval quality, latency, throughput
├── mcp_server.py     # 63-tool MCP server (JSON-RPC, stdio)
├── cli.py            # CLI — ingest, search, remember, reembed, watch, export, import, index, serve
├── config.py         # yaml config with env var overrides, auto-dim detection
├── demo.py           # interactive demo walkthrough
└── web/
    ├── app.py        # fastapi with model warmup, bearer token auth
    ├── routes.py     # 70+ REST endpoints
    ├── events.py     # SSE event stream (in-process)
    └── templates/
        └── index.html  # single-page dashboard — neural canvas, 17 panels, keyboard shortcuts

tests/
├── test_store.py       # CRUD, FTS, entities, events, cache
├── test_embeddings.py  # multi-backend, cosine search, dim detection
├── test_ann_index.py   # HNSW build, search, add/remove, persistence, recall
├── test_retrieval.py   # hybrid pipeline, intent classification, debug mode
├── test_surprise.py    # novelty scoring, importance adjustment
└── test_config.py      # config loading, auto-dim, env overrides

benchmarks/
└── longmemeval/
    └── run_engram.py   # LongMemEval benchmark adapter (98.1% R@5)

Dockerfile              # python 3.12-slim, all deps, port 8420
docker-compose.yml      # single service, data volume, config mount

supported formats

engram can ingest these file types:

format

how it's handled

markdown (.md)

split by headers into sections

plaintext (.txt)

treated as single document

claude code (.jsonl)

parsed as conversation exchanges, grouped into Q+A pairs

claude.ai (.json)

parsed from chat_messages array

chatgpt (.json)

parsed from mapping tree structure

slack (.json)

parsed from messages array with user attribution

PDF (.pdf)

text extracted via pymupdf

generic JSON

each item or the whole object as a document

conversation formats get special treatment — exchanges are grouped into Q+A pairs and classified (decisions, corrections, errors, task completions) before storing.

what informed the design

i studied three existing memory systems and six IR papers before building this. took the best parts from each:

systems:

  • neuro-memory — my earlier memory system. atkinson-shiffrin 4-layer model, ebbinghaus forgetting curve, 7-factor importance scoring, procedural memory with pattern templates. engram takes the layer architecture and lifecycle model from here.

  • cmyui/ai-memory — LLM-extracted atomic facts, three-stage hybrid retrieval with RRF, dream cycle

  • mempalace — graduated layers (L0-L3), entity registry with disambiguation, exchange-pair chunking for conversations, AAAK compression

papers:

  • Reciprocal Rank Fusion (Cormack et al. 2009) — the RRF formula and k=60 constant

  • Memory in the Age of AI Agents (Hu et al. 2026) — forms/functions/dynamics taxonomy

  • docTTTTTquery (Nogueira & Lin 2019) — document expansion by query prediction

  • ColBERT-PRF (Wang et al. 2021) — pseudo-relevance feedback for dense retrieval

  • BM25 Query Augmentation (Chen & Wiseman 2023) — learned query expansion

  • Word Embedding GLM (Ganguly et al. 2015) — embedding-based language model for IR

  • Titans (Behrouz et al. 2025) — surprise-based memorization, memory updates proportional to loss gradient

  • Miras (Behrouz et al. 2025) — unifying framework for sequence models, forgetting as retention regularization

  • Your Brain on ChatGPT (Kosmyna et al. 2025) — cognitive scaffolding vs replacement, recall_hints design

  • SkillsBench (Li et al. 2026) — focused skills (+16.2pp) beat comprehensive docs (-2.9pp), get_skills gate design

  • Evaluating AGENTS.md (Gloaguen et al. 2026) — static context files reduce performance, validates dynamic retrieval over flat injection

  • A-Mem (Wu et al. 2025) — zettelkasten memory with enriched embeddings and memory evolution, enrichment doubled multi-hop F1

  • AgeMem (Chen et al. 2026) — RL-trained memory operations, quality_metrics reward decomposition

  • MAGMA (Zhao et al. 2026) — multi-graph architecture with intent-aware adaptive retrieval policy

  • Zep/Graphiti (Preston-Werner et al. 2025) — temporal knowledge graph with three-tier architecture

  • SuperLocalMemory V3.3 (2026) — trust-weighted decay, lifecycle embedding compression, confirmation count

  • Mem0 (Chhablani et al. 2025) — production write-path CRUD classification, temporal marked deletion

  • Memory for Autonomous Agents (2026) — latest comprehensive survey, adversarial belief probing, write-path canonicalization

  • Mem^p (2025) — procedural memory with dual representation and reflection-based updates

  • ACT-R Memory (HAI 2025) — base-level activation, retrieval noise, threshold gating

config

lives at config.yaml or ~/.config/engram/config.yaml. env vars override everything (prefix with ENGRAM_, e.g. ENGRAM_DB_PATH).

db_path: ~/.local/share/engram/memory.db

# embedding model — local or API
# local:  BAAI/bge-small-en-v1.5 (384d, free, default)
# voyage: voyage-3.5 (1024d, $0.18/1M), voyage-3.5-lite (1024d, $0.02/1M)
# openai: text-embedding-3-small (1536d, $0.02/1M), text-embedding-3-large (3072d)
# gemini: gemini-embedding-001 (768d, free tier)
embedding_model: BAAI/bge-small-en-v1.5
cross_encoder_model: cross-encoder/ms-marco-MiniLM-L-6-v2
embedding_backend: auto                   # auto | mlx | sentence_transformers | voyage | openai | gemini
embedding_dim: 384                        # auto-detected from model name if known

retrieval:
  top_k: 10
  rrf_k: 60
  min_confidence: 0.60
  rerank_candidates: 20
  dense_multiplier: 3          # candidates = top_k * multiplier
  bm25_multiplier: 3

lifecycle:
  forgetting_half_life_days: 30
  archive_after_days: 90
  archive_min_importance: 0.3  # below this + age + low access → forget
  archive_min_accesses: 3
  promote_importance: 0.7
  promote_accesses: 5
  cluster_threshold: 0.8
  cluster_min_size: 5
  retention_mode: huber        # l2 | huber | elastic
  huber_delta: 0.5             # transition point for huber (in half-lives)
  elastic_l1_ratio: 0.3        # L1 weight for elastic (0=pure L2, 1=pure L1)

llm:
  backend: claude_cli          # claude_cli | mlx | llamacpp
  model: claude-sonnet-4-20250514
  mlx_model: mlx-community/Qwen2.5-3B-Instruct-4bit

web:
  host: 127.0.0.1
  port: 8420
  auth_token: ""               # set to enable bearer token auth on the web UI

ann:
  enabled: true
  m: 32                        # HNSW graph connectivity (higher = better recall, more memory)
  ef_construction: 200         # build-time search depth (higher = better index quality)
  ef_search: 100               # query-time search depth (higher = better recall, slower)
  max_elements: 500000         # pre-allocated capacity
  index_path: ~/.local/share/engram/hnsw.index

license

MIT

A
license - permissive license
-
quality - not tested
A
maintenance

Maintenance

Maintainers
Response time
1dRelease cycle
8Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/raya-ac/engram'

If you have feedback or need assistance with the MCP directory API, please join our Discord server