We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/kael-bit/engram-rs'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
# engram-rs Internal Architecture
> Audience: AI auditing/optimizing prompts in `src/prompts.rs`.
> Describes what the code DOES as of the current source. All thresholds from `src/thresholds.rs`.
> All LLM prompts and tool schemas live in `src/prompts.rs`.
> **This document must be updated whenever the architecture changes.**
---
## 1. Memory Model
Each memory has: `id` (UUID), `content` (≤8192 chars), `layer` (1/2/3), `importance` (0.0–1.0),
`created_at`, `last_accessed`, `access_count`, `repetition_count`, `decay_rate`, `source`,
`tags` (JSON array, ≤20 tags, ≤32 chars each), `namespace`, `embedding` (BLOB), `kind`
(`semantic`|`episodic`|`procedural`), `modified_at`.
### Layers
| Layer | Enum | Bias | Semantics |
|-------|------|------|-----------|
| **Buffer** | 1 | −0.1 | Intake. All new memories land here via API. Evicted by epoch-based decay or capacity cap (200). |
| **Working** | 2 | 0.0 | Active knowledge. Promoted from Buffer by consolidation. Never deleted — importance decays by kind. |
| **Core** | 3 | +0.1 | Long-term identity. Promoted from Working through LLM gate. Never deleted. |
**Decay rates by kind** (exponential, epoch-based, per active consolidation cycle):
| Kind | Decay factor | Half-life | Use case |
|------|-------------|-----------|----------|
| `episodic` | ×0.98 | ~35 epochs | Events, experiences, time-bound context |
| `semantic` | ×0.988 | ~58 epochs | Knowledge, preferences, lessons (default) |
| `procedural` | ×0.996 | ~173 epochs | Workflows, instructions, how-to |
Decay only happens during active consolidation — no decay while the agent is idle. Working and Core memories are never deleted regardless of importance. Buffer memories below `decay_drop_threshold` (0.01) are evicted. Exponential decay follows the Ebbinghaus forgetting curve — importance approaches the floor asymptotically but never reaches zero.
**Layer entry:** All API writes enter Buffer regardless of any `layer` parameter. Three layers each have a single entry point: Buffer ← API, Working ← consolidation promotion, Core ← LLM gate. The agent cannot specify which layer a memory goes to.
---
## 2. Memory Lifecycle
```
API POST /memories
│
▼
[Validation] → content ≤8192, tags ≤20, source ≤64
│
▼
[Dedup Check] ─── duplicate found ──→ touch() + increment repetition_count → return existing
│ no dup
▼
[Insert into Buffer]
│
▼
[Async: generate embedding, index FTS, update vec index]
│
╔══════════════════════════════════════════════════════════╗
║ CONSOLIDATION CYCLE (every 30 min via cron/API call) ║
╠══════════════════════════════════════════════════════════╣
║ ║
║ 1. cleanup_stale_tags() ║
║ 2. Buffer capacity cap ║
║ 3. Working quality (never delete, never demote) ║
║ 4. Buffer → Working (mechanical: score ≥ 5 OR kind) ║
║ 5. Buffer capacity cap enforcement ║
║ 6. Working → Core candidates collected ║
║ 7. LLM Gate: review candidates → approve/reject ║
║ 8. Reconcile: LLM detects same-topic updates ║
║ 9. Session distillation: 3+ session notes → summary ║
║ 10. Buffer dedup: cosine-based, no LLM ║
║ 11. Buffer triage: LLM evaluates buffer → promote/keep ║
║ 12. Merge: LLM combines near-duplicates (if enabled) ║
║ 13. FTS repair ║
║ 14. Core summary update ║
║ 15. Importance decay (epoch-based, kind-differentiated) ║
║ 16. Drop fully-decayed buffer memories ║
╚══════════════════════════════════════════════════════════╝
```
### 2.1 Insert & Dedup
On `POST /memories`, the API handler:
1. Validates input (content length, tag count).
2. **Layer override:** Ignores any `layer` parameter — all writes go to Buffer.
2. **API-level semantic dedup** (`quick_semantic_dup`): if AI is configured, embeds content and
checks cosine similarity against stored memories. Threshold depends on source:
- Proxy extraction: `PROXY_DEDUP_SIM = 0.60`
- Normal insert: `INSERT_DEDUP_SIM = 0.65`
3. If a duplicate is found: calls `touch()` (bumps `access_count`, updates `last_accessed`)
and increments `repetition_count`. Returns the existing memory.
4. If no duplicate: calls `db.insert()`.
**DB-level dedup** (`find_near_duplicate` in `db/memory.rs`):
- Runs on insert when `skip_dedup` is not set.
- If embedding is provided: **Jaccard pre-filter** (tokenize both texts, require Jaccard > `DEDUP_JACCARD_PREFILTER = 0.50`) → then **cosine similarity** (> `DEDUP_COSINE_SIM = 0.85`).
- If no embedding: Jaccard-only check.
- On duplicate hit with very high similarity (> `INSERT_MERGE_SIM = 0.80`): calls LLM to merge
content of old + new, using `INSERT_MERGE_PROMPT`. Updates existing memory's content in-place.
### 2.2 Buffer → Working Promotion (Mechanical)
In `consolidate_sync()`, two paths:
**By reinforcement score:**
- `reinforcement_score = access_count + repetition_count × 2.5`
- Promote if score ≥ `buffer_threshold` (default: `max(promote_threshold, 5)` = 5).
**By kind (cooldown-gated):**
- Lessons (`tag: lesson`) or procedurals (`kind: procedural`) auto-promote after 4 consolidation
epochs in buffer (~2h of active consolidation). Distilled sessions excluded.
### 2.3 Buffer Triage (LLM)
After mechanical promotion, the LLM evaluates remaining buffer memories:
- **Eligibility:** >10 min old, not `distilled`/`ephemeral`, not already promoted.
- **Batch:** up to 20 per consolidation cycle, grouped by namespace.
- **LLM call:** `gate` model, `TRIAGE_SYSTEM` prompt. Decides `promote` or `keep` per memory.
- Promoted memories move to Working; can also set `kind: procedural`.
- `keep` decisions add `_triaged` tag (prevents re-triage on next cycle).
### 2.4 Buffer Eviction
Buffer eviction is epoch-based, consistent with the decay model — no wall-clock TTL.
**Eviction by decay:** Buffer memories decay through epoch-based importance reduction during active
consolidation cycles. When importance falls below `decay_drop_threshold` (0.01), the memory is deleted.
Idle periods cause zero decay, so memories survive agent downtime intact.
**Capacity cap** (default 200, env `ENGRAM_BUFFER_CAP`): when buffer exceeds cap, lowest-weight
entries are evicted. This prevents unbounded growth when the agent writes faster than consolidation
promotes.
### 2.5 Working → Core Promotion (LLM Gate)
**Candidate selection** (in `consolidate_sync`):
- `reinforcement_score ≥ promote_threshold (default 3)` AND `importance ≥ 0.6`, OR
- Age > `working_age_promote_secs (default 7 days)` AND score > 0 AND importance ≥ 0.6.
- Session notes, ephemeral, auto-distilled, and distilled memories are blocked.
- `gate-rejected-final` memories never retry.
- `gate-rejected` retries after 48 consolidation epochs; `gate-rejected-2` after 144 epochs.
Epoch-based: idle time does not count toward cooldown.
**LLM Gate** (`llm_promotion_gate`):
- Model: `gate` tier.
- Prompt: `GATE_SYSTEM` — **whitelist approach**: only 4 categories are allowed into Core:
1. **LESSON** — hard-won insights from mistakes or experience
2. **IDENTITY** — who the user/agent is, preferences, constraints
3. **CONSTRAINT** — unconditional rules (if it has "for now" / "temporary", reject)
4. **DECISION RATIONALE** — why a permanent architectural/design choice was made
- Everything outside these categories is rejected by default.
- Context injected: high access count (≥30) or repetition count (≥2) noted.
- Returns `approve` (with kind) or `reject`.
- Rejection tagging escalates: `gate-rejected` → `gate-rejected-2` → `gate-rejected-final`.
**Working layer protection** (run before main consolidation):
- Working memories are never deleted or demoted. Gate-rejected Working memories only lose importance
through natural decay — they are not deleted, demoted, or removed on any schedule.
### 2.6 Demotion
- Session notes (`source: session` or `tag: session`) in Core → demoted to Working.
- `ephemeral` tagged Core memories → demoted to Working.
- Working memories are never demoted to Buffer. Demotion only applies to Core → Working.
- Audit can demote via LLM decision (see §5).
### 2.7 Importance Decay
Every consolidation cycle (epoch-based, not time-based): `decay_importance(cycle_start, factor=0.02, floor=0.01)`.
All kinds decay exponentially at different rates: episodic ×0.98, semantic ×0.988, procedural ×0.996 per epoch. Decay only happens during active consolidation cycles — idle periods cause zero decay. Floor of 0.01 — memories asymptotically approach but never reach zero, remaining retrievable under precise queries. Activation boost on recall (+0.03) lets frequently-used memories self-reinforce.
---
## 3. Consolidation Pipeline
Triggered by `POST /consolidate` or cron. Runs `consolidate()`:
### Phase 1: Synchronous (`consolidate_sync`)
1. **Tag cleanup:** Remove stale `gate-rejected` tags from Working after sufficient epochs without access.
Remove orphaned `promotion` tags from Working/Core.
2. **Buffer capacity cap:** Evict lowest-weight entries if >200 buffer entries.
3. **Working quality:** Working memories are never deleted or demoted.
4. **Core hygiene:** Demote session notes and ephemeral from Core.
5. **Core overlap scan:** O(n²) pairwise cosine on Core embeddings. Flag pairs with
similarity > `CORE_OVERLAP_SIM = 0.70`. Stored in `engram_meta` to avoid re-flagging.
6. **Working → Core candidates:** Collect based on reinforcement score + importance.
7. **Buffer → Working:** Mechanical promotion (score ≥ 5 or lesson/procedural after 2h).
8. **Buffer capacity cap:** FIFO eviction if >200 entries.
9. **Buffer capacity cap:** FIFO eviction if >200 entries.
10. **Drop decayed:** Delete Buffer memories below `decay_drop_threshold = 0.01`. Working/Core are never deleted.
11. **Importance decay:** exponential, ×0.98/cycle (episodic), floor 0.01.
### Phase 2: Async (LLM-dependent)
12. **LLM Gate:** Review Working→Core candidates. Approve or reject with escalating epoch-based cooldowns.
13. **Reconcile:** Detect same-topic updates across all layers (see §3.2).
14. **Session distillation:** Synthesize 3+ session notes into project status snapshot (see §3.3).
15. **Buffer dedup:** Cosine-based duplicate removal within buffer (no LLM).
16. **Buffer triage:** LLM evaluates buffer memories for promotion.
17. **Merge** (if `merge=true`): LLM combines near-duplicate memories (see §3.1).
18. **FTS repair:** Fix orphaned FTS entries after merges/deletes.
19. **Core summary:** Update compressed summary in `engram_meta` for resume.
### 3.1 Merge (`merge_similar`)
In `consolidate/merge.rs`:
1. Fetch all memories with embeddings.
2. Group by namespace. Within each namespace, find cosine-similar pairs:
- Threshold: `MERGE_SIM = 0.78`
3. For each pair, call LLM (`merge` model) with `MERGE_SYSTEM` prompt.
4. Winner (more important or more accessed) absorbs loser:
- Winner's content replaced with merged text.
- Tags merged (union, cap 20).
- Importance = max of both.
- Access counts summed.
- Loser deleted.
5. Cross-layer merge: result goes to the **higher** layer.
6. Dedup key stored in `engram_meta` to prevent re-merging same pair.
### 3.2 Reconcile (`reconcile_updates`)
Detects when a newer memory supersedes an older one on the same topic.
1. Scan Working + Core memories with embeddings.
2. Find pairs in similarity window: `RECONCILE_MIN_SIM (0.55) < sim < RECONCILE_MAX_SIM (0.78)`.
3. Require time gap: newer must be ≥1h newer than older.
4. **Single LLM call** (`merge` model) with `RECONCILE_PROMPT` — returns both decision and merged content:
- `update` + `merged_content`: older is stale → write merged content to newer, delete older.
- `absorb` + `merged_content`: overlap detected → write merged content to newer, delete older.
- `keep_both`: different aspects → no action.
5. Merged content preserves ALL specific details from both entries (names, numbers, constraints,
reinforcement language). Falls back to newer's content if `merged_content` is empty.
6. Winner inherits: max importance, summed access counts, union of tags.
7. Dedup key stored per pair to prevent re-evaluation.
### 3.3 Session Distillation (`distill_sessions`)
Groups undistilled session notes by namespace. When 3+ notes accumulate:
1. Take latest 10 notes.
2. LLM (`gate` model) synthesizes a project status snapshot (≤250 chars).
3. Check for near-duplicate existing distillation (threshold: `TRIAGE_DEDUP_SIM = 0.75`).
4. Insert result as Working memory with tags `[project-status, auto-distilled]`.
5. Source notes tagged `distilled` to prevent reprocessing.
6. `auto-distilled` tag blocks Core promotion (project status is too volatile).
---
## 4. Recall & Scoring
### 4.1 Hybrid Retrieval (`recall()` in `src/recall.rs`)
Three retrieval channels run in parallel:
**FTS5 keyword search:**
- Pre-processed with jieba (Chinese segmentation) + bigrams (Japanese/Korean).
- Returns BM25-scored results.
**Semantic search:**
- Query embedded via `text-embedding-3-small`.
- If enough FTS+fact candidates exist (≥ `fetch_limit × 2`): restrict semantic search to
those candidate IDs only (filtered mode). Otherwise: full corpus scan.
- Floor: `DEFAULT_SIM_FLOOR = 0.30` (or `min_score` if set).
**Fact graph:**
- Query entities against `facts` table (subject/object FTS).
- Multi-hop traversal (up to 2 hops) for richer context.
- Fact-linked memories get `relevance = 1.0`.
### 4.2 Scoring Formula
```
raw = relevance × (1 + 0.4 × weight + 0.2 × recency)
score = sigmoid(raw) = 2/(1 + e^(-2×raw)) - 1
```
Where:
- `weight = importance + rep_bonus + access_bonus + kind_bias + layer_bias` (see `memory_weight()`)
- `rep_bonus = 0.17 × ln(1 + rep_count)`, capped at 0.7
- `access_bonus = 0.12 × ln(1 + access_count)`, capped at 0.55
- `kind_bias`: procedural=+0.15, semantic=0, episodic=−0.1
- `layer_bias`: core=+0.1, working=0, buffer=−0.1
- `recency = exp(-decay_rate × age_hours / 168)` (168h = 1 week half-life)
- Score compressed via sigmoid — approaches 1.0 asymptotically, preserving ranking discrimination in high-score regions (no hard cap).
### 4.3 Dual-Hit Boost
Memories found by **both** semantic and FTS get boosted:
- Normal queries: `relevance *= (1 + fts_rel × 0.3 × sem_gate)` where `sem_gate = min(1, relevance/0.45)`.
- Short CJK queries (<10 chars): `fts_boost = 0.6`, higher FTS floor (`0.50 + fts_rel × 0.35`).
This compensates for `text-embedding-3-small`'s poor discrimination on short Chinese text.
**FTS-only hits** (no semantic confirmation):
- Normal: capped at `fts_rel × 0.25`.
- Short CJK: capped at `fts_rel × 0.40`.
**Keyword affinity penalty:**
Semantic-only hits (no FTS match) that lack any query terms in content get `relevance *= 0.7`.
### 4.4 Query Expansion
**Explicit** (`expand=true`): LLM generates 4–6 alternative search phrases via `EXPAND_PROMPT`.
Bilingual (Chinese + English) expansion required.
**Auto-expand**: If `expand` not set AND top result has `relevance < AUTO_EXPAND_THRESHOLD (0.25)`,
automatically expands and retries. Uses expanded result only if it beats original.
### 4.5 Touch
Only memories with `relevance > 0.5` get `touch()`-ed. Dry queries (`dry=true`) skip touch.
---
## 5. Topic Distillation (Audit)
### 5.1 Overview
When a topic accumulates too many memories (≥ `DISTILL_THRESHOLD = 10`), the distillation system
condenses overlapping memories into fewer, richer entries. This replaces the old global-scan audit.
One topic per LLM call, up to `DISTILL_MAX_PER_CYCLE = 2` topics per consolidation cycle.
### 5.2 Flow (`distill_topics` in `consolidate/audit.rs`)
1. Load the cached topiary tree (`topiary_tree` in `engram_meta`).
2. Walk leaf nodes, find those with ≥ `DISTILL_THRESHOLD` members. Sort by size descending.
3. For each bloated topic (up to 2 per cycle):
a. Load full memories for all members.
b. Build prompt listing each memory with `[index] id layer kind importance content`.
c. LLM call (`audit` model, `DISTILL_TOPIC_SYSTEM` prompt, `distill_topic_schema()`).
d. LLM returns groups of overlapping memories merged into distilled entries.
4. For each distilled entry:
- Must replace 2+ source memories (single-source "merges" are skipped).
- Insert distilled memory with tag `distilled`, source `distill`, importance 0.7.
- Layer set to max of source layers.
- Source memories deleted via `supersedes`.
### 5.3 LLM Rules
- **Preserve ALL specific details** — names, numbers, IDs, dates, commands. Never generalize.
- **Only merge genuinely overlapping memories** — same subject, redundant info.
- **If nothing overlaps, return empty array** — don't force merges.
- **Consolidate, don't summarize** — merged text should be as long as the longest source.
- **Same language as input** — don't translate.
---
## 6. Topiary — Topic Clustering
### 6.1 Overview
Topiary builds a hierarchical topic tree from memory embeddings. It organizes all memories (Core,
Working, and Buffer) into named topic clusters, providing an index for resume and drill-down recall
via `POST /topic`.
### 6.2 Architecture
```
memory write → embed queue → 500ms batch → embeddings stored
→ topiary trigger → 5s debounce → rebuild tree
→ name dirty topics (LLM) → cache in engram_meta
consolidation completes → topiary trigger → same flow
```
**Module structure:** `src/topiary/`
| File | Lines | Purpose |
|------|-------|---------|
| `mod.rs` | ~600 | TopicNode, TopicTree, insert, consolidate, hierarchy, helpers |
| `cluster.rs` | ~560 | k-means (spherical, seeded), split/merge passes, enforce_budget |
| `worker.rs` | ~220 | Debounced async background worker |
| `naming.rs` | ~210 | LLM batch naming via `ai::llm_tool_call` |
### 6.3 Data Flow
1. **Entry bridge:** engram `db::Memory` → `topiary::Entry { id, text, embedding: Vec<f32> }`.
Only memories with embeddings are included.
2. **Insert:** Each entry assigned to closest leaf (cosine > `assign_threshold`) or creates a new
singleton leaf. Centroid updated incrementally with L2 normalization.
3. **Consolidate:** Up to 10 split/merge cycles until stable, then `enforce_budget` (k-means if
leaves exceed `LEAF_BUDGET=256`), hierarchy construction, small-leaf absorption, single-child
pruning. Leaf IDs reassigned to sequential `kb1, kb2, ...`.
4. **Naming:** Dirty leaves (new/changed) batched in groups of 30, sent to LLM via `ai::llm_tool_call`
with `TOPIC_NAMING_SYSTEM` prompt. Existing names provided as dedup context.
5. **Storage:** Two forms cached in `engram_meta`:
- `topiary_tree`: JSON summary (topic IDs, names, member counts, sample texts)
- `topiary_tree_full`: Full serialized `TopicTree` (with centroids, for future incremental updates)
- `topiary_entry_ids`: Ordered entry ID list for index→ID resolution
### 6.4 Worker
`topiary::worker::spawn_worker(db, ai, trigger_rx)` — tokio task.
- **Trigger sources:** EmbedQueue flush completion, post-consolidation
- **Debounce:** 5 second quiet window (drains signals, resets on each new signal)
- **Startup:** If no `topiary_tree` in meta, immediately rebuilds
- **Input:** All Core + Working + Buffer memories with embeddings
- **Output:** Cached tree in `engram_meta`
- **Safety net:** If any topic remains unnamed after LLM naming, the tree is not saved — the
previous cached tree is preserved instead. This prevents partially-named trees from replacing
good cached state.
### 6.5 Parameters
| Parameter | Value | Description |
|-----------|-------|-------------|
| `assign_threshold` | 0.30 | Min cosine similarity to assign entry to existing leaf |
| `merge_threshold` | 0.55 | Min similarity between leaf centroids to merge |
| `max_leaf_size` | 8 | Split threshold for oversized leaves |
| `min_internal_sim` | 0.35 | Minimum intra-cluster similarity |
| `LEAF_BUDGET` | 256 | Max leaf count before k-means enforcement |
| `ABSORB_THRESHOLD` | 0.40 | Min similarity to absorb small (≤2 member) leaves |
| Debounce | 5s | Quiet window before rebuild |
| Naming batch | 30 | Max topics per LLM naming call |
### 6.6 POST /topic Endpoint
```
POST /topic
Body: {"ids": ["kb1", "kb3"]}
Response: {
"kb1": {"name": "Memory architecture", "memories": [...]},
"kb3": {"name": "User preferences", "memories": [...]}
}
```
Loads `topiary_tree_full` from meta, finds leaf by ID, resolves member indices to entry IDs
via `topiary_entry_ids`, fetches full memories from DB.
---
## 7. LLM Call Sites
**LLM Level** (`ENGRAM_LLM_LEVEL`, default `auto`): Controls when consolidation calls LLMs.
In `auto` mode, high-confidence decisions use heuristics; only uncertain cases invoke LLMs.
In `off` mode, triage/gate use pure heuristics and merge/reconcile are skipped.
Every place the system calls an LLM, with model tier and purpose:
| Call Site | Model Tier | Component Log | Purpose | Prompt |
|-----------|-----------|---------------|---------|--------|
| **Triage** | `gate` | `triage` | Evaluate buffer memories → promote/keep | `TRIAGE_SYSTEM` |
| **Gate** | `gate` | `gate` | Approve/reject Working→Core promotion | `GATE_SYSTEM` |
| **Merge** | `merge` | `merge` | Combine near-duplicate content | `MERGE_SYSTEM` |
| **Reconcile** | `merge` | `reconcile` | Judge UPDATE/ABSORB/KEEP + merge content in one call | `RECONCILE_PROMPT` |
| **Audit** | `audit` (→`gate`) | `audit` | Review all Core+Working, propose ops | `AUDIT_SYSTEM` |
| **Expand** | `expand` | `query_expand` | Generate alternative search queries | `EXPAND_PROMPT` |
| **Extract** | `extract` | `extract` | Extract memories from conversation text | `EXTRACT_SYSTEM_PROMPT` |
| **Fact extract** | `extract` | `fact_extract` | Extract (subject, predicate, object) triples | `FACT_EXTRACT_PROMPT` (disabled) |
| **Resume compress** | `resume` (→default) | `resume_compress` | Summarize section for context budget | Inline system prompt |
| **Insert merge** | `merge` | `insert_merge` | Merge on insert when high similarity | `INSERT_MERGE_PROMPT` |
| **Distill** | `audit` (→`gate`) | `distill` | Condense bloated topic members | `DISTILL_TOPIC_SYSTEM` |
| **Naming** | `naming` (→default) | `naming` | Name topic clusters for resume index | `TOPIC_NAMING_SYSTEM` |
Model tier resolution (`AiConfig::model_for`): each component checks its env var
(e.g. `ENGRAM_GATE_MODEL`), falls back to `ENGRAM_LLM_MODEL`, defaults to `gpt-4o-mini`.
`audit` falls back to `gate` model if no dedicated audit model.
All LLM calls are logged in `llm_usage` table with component, model, token counts, and duration.
---
## 7. Resume System
`GET /resume` provides session recovery context. Four sections, fixed budget.
### Namespace Merging
When a project namespace is set (e.g. `ns=engram-rs`), resume fetches from **both** the project
namespace and `default`. This ensures cross-project knowledge (user identity, preferences, universal
lessons stored in `default`) is always available alongside project-specific context.
Recall follows the same rule: queries with a namespace filter also include `default` results.
**Directional merge rule:** Consolidation merge and reconcile allow cross-namespace operations only
when one side is `default`. The merged result always stays in `default` — project-level memories can
be absorbed into `default`, but `default` memories are never pulled into a project namespace.
### Output Format
```
=== Core (N) ===
[full text, no truncation, up to ~2k tokens]
=== Recent (Nh) ===
[recently modified/created non-Core memories, time descending, up to ~1k tokens]
=== Topics (Working: N, Buffer: N) ===
kb1: "Topic name" [8]
kb2: "Another topic" [5]
...
Triggers: git-push, deploy, memory-store, ...
```
### Section Details
| Section | Content | Budget | Sort |
|---------|---------|--------|------|
| **Core** | Permanent rules/identity, full text | ~2k tokens | `memory_weight()` (importance + rep/access bonuses + kind/layer biases) |
| **Recent** | Non-Core memories modified in last N consolidation epochs | ~1k tokens | Time descending |
| **Topics** | Flat leaf index from cached topiary tree | ~300-500 tokens | Member count descending |
| **Triggers** | All `trigger:*` tags | ~100-200 tokens | Access count descending |
**Core sorting:** Uses `memory_weight()` — additive kind bias (+0.15 procedural, −0.1 episodic) and layer bias (+0.1 core). Procedural memories naturally rank higher because they decay slowest AND get a +0.15 bias.
**Topics:** Read from `topiary_tree` in `engram_meta`. If no tree cached, section is omitted.
Agent can drill into any topic via `POST /topic {"ids": ["kb1", "kb3"]}`.
**Triggers:** Collected from all memories with tags matching `trigger:*` pattern, deduplicated,
sorted by aggregate access count.
---
## 8. Algorithms & Thresholds Reference
### All Cosine Similarity Thresholds
| Constant | Value | Used In |
|----------|-------|---------|
| `PROXY_DEDUP_SIM` | 0.60 | Proxy extraction dedup |
| `INSERT_DEDUP_SIM` | 0.65 | API insert dedup |
| `RECONCILE_MIN_SIM` | 0.55 | Reconcile: lower bound (related but not duplicate) |
| `AUDIT_MERGE_MIN_SIM` | 0.65 | Audit: merge suggestion lower bound |
| `CORE_OVERLAP_SIM` | 0.70 | Core overlap detection |
| `TRIAGE_DEDUP_SIM` | 0.75 | Triage + distill dedup, buffer dedup |
| `MERGE_SIM` | 0.78 | Auto-merge threshold, reconcile upper bound, audit merge upper bound |
| `RECONCILE_MAX_SIM` | 0.78 | = MERGE_SIM |
| `AUDIT_MERGE_MAX_SIM` | 0.78 | = MERGE_SIM |
| `INSERT_MERGE_SIM` | 0.80 | Insert-time content merge |
| `DEDUP_COSINE_SIM` | 0.85 | DB-level last-resort dedup |
| `RECALL_DEDUP_SIM` | 0.78 | = MERGE_SIM, used in quick_semantic_dup |
### Other Thresholds
| Constant | Value | Used In |
|----------|-------|---------|
| `DEDUP_JACCARD_PREFILTER` | 0.50 | Pre-filter before cosine in DB dedup |
| `BUFFER_CAP` | 200 (env) | Max buffer entries before capacity-based eviction |
| `DISTILL_THRESHOLD` | 10 | Min topic members before distillation kicks in |
| `DISTILL_MAX_PER_CYCLE` | 2 | Max topics to distill per consolidation cycle |
| `RESUME_HIGH_RELEVANCE` | 0.25 | Identity/constraint boost in resume |
| `RESUME_LOW_RELEVANCE` | 0.10 | Lesson/procedural boost in resume |
| `RESUME_CORE_THRESHOLD` | 0.35 | Min relevance to include Core in resume |
| `RESUME_WORKING_THRESHOLD` | 0.20 | Min relevance to include Working in resume |
### Recall Scoring Weights
| Weight | Value |
|--------|-------|
| `WEIGHT_RELEVANCE` | 0.60 |
| `WEIGHT_IMPORTANCE` | 0.20 |
| `WEIGHT_RECENCY` | 0.20 |
| `AUTO_EXPAND_THRESHOLD` | 0.25 |
| `DEFAULT_SIM_FLOOR` | 0.30 |
### Consolidation Defaults
| Parameter | Default | Description |
|-----------|---------|-------------|
| `promote_threshold` | 3 | Reinforcement score for Working→Core candidacy |
| `promote_min_importance` | 0.6 | Min importance for any promotion |
| `decay_drop_threshold` | 0.01 | Below this → delete (Buffer only; Working/Core never deleted) |
| `BUFFER_CAP` | 200 (env) | Max buffer entries before FIFO eviction |
| `working_age_promote_secs` | 604800 (7d) | Age-based Working→Core candidacy |
| `buffer_threshold` | max(promote_threshold, 5) = 5 | Reinforcement score for Buffer→Working |
| `BUFFER_CAP` | 200 (env) | Max buffer entries before FIFO eviction |
| `importance_decay` | ×0.98/epoch (exponential), floor 0.01 | Epoch-based importance reduction (per active consolidation cycle) |
| `activation_boost` | +0.03 per recall | Importance bump when a memory is retrieved |
---
## 9. Indexing
### FTS5
- SQLite FTS5 with `unicode61` tokenizer.
- CJK text pre-processed: Chinese via jieba segmentation, Japanese/Korean via bigrams.
- Content and tags indexed. Rebuilt on startup.
- Manual insert/delete/update (no triggers).
### Vector Index (HNSW)
- In-memory `VecIndex` using `instant-distance` HNSW.
- Loaded from SQLite embedding BLOBs on startup.
- Supports filtered search (`search_semantic_by_ids`) and full scan (`search_semantic_ns`).
- Embeddings stored as f32 arrays serialized to bytes in SQLite BLOB column.
### Fact Graph
- `facts` table: `(subject, predicate, object, memory_id)`.
- FTS on subject/object for entity lookup.
- Multi-hop traversal (`query_multihop`) follows entity→fact→entity chains up to N hops.
- Fact extraction currently disabled in consolidation (low quality), but API available.
---
## 10. Key Invariants
1. **Memories only move up layers through LLM review** (triage or gate). Mechanical promotion
exists for Buffer→Working only (high reinforcement score or lesson/procedural kind).
2. **Topic distillation condenses, never deletes** — overlapping memories within a bloated topic
are merged into richer entries. Source memories are superseded (deleted), but the information is preserved
in the distilled output. Natural lifecycle (epoch-based decay, capacity cap) handles cleanup of unmerged memories.
3. **Demotions are one-layer-at-a-time** — Core→Working or Working→Buffer, never skip.
4. **Working memories are never deleted** — importance decays at different rates by kind, but memories
remain searchable even at zero importance. Only Buffer entries can be evicted.
5. **Repetition > access for importance signal** — `REPETITION_WEIGHT = 2.5` means being restated
counts 2.5× more than being recalled.
6. **Session notes never promote to Core** — they're episodic by nature. Instead, they're distilled
into project status snapshots that can promote normally.
7. **All LLM decisions are logged** — component, model, token usage, and duration tracked in `llm_usage`.
8. **Gate rejections escalate** — 3 chances with increasing epoch-based cooldowns (48, 144, never),
preventing infinite retry loops while ensuring idle time doesn't auto-grant retries.
9. **Resume doesn't touch memories** — only recall (query-driven) increments access counts.
10. **Merge direction is always upward** — cross-layer merge result goes to the highest source layer.
11. **Topiary is eventually consistent** — tree rebuilds are debounced and async. Resume reads a
cached snapshot; it may lag a few seconds behind the latest writes. This is intentional — resume
must be fast, and the tree is an index, not a source of truth.