engram-rs

README.md•10.8 KiB

# engram-rs [![CI](https://img.shields.io/github/actions/workflow/status/kael-bit/engram-rs/ci.yml?style=flat-square&logo=github&label=CI)](https://github.com/kael-bit/engram-rs/actions/workflows/ci.yml) [![License: MIT](https://img.shields.io/badge/license-MIT-blue?style=flat-square)](LICENSE) [![Rust](https://img.shields.io/badge/rust-1.75%2B-orange?style=flat-square&logo=rust)](https://www.rust-lang.org) [![GitHub stars](https://img.shields.io/github/stars/kael-bit/engram-rs?style=flat-square&color=yellow)](https://github.com/kael-bit/engram-rs) [![Docker](https://img.shields.io/badge/docker-ghcr.io-blue?style=flat-square&logo=docker)](https://ghcr.io/kael-bit/engram-rs) Memory engine for AI agents. Two axes: **time** (three-layer decay & promotion) and **space** (self-organizing topic tree). Important memories get promoted, noise fades, related knowledge clusters automatically. Most agent memory is a flat store — dump everything in, keyword search to get it back. No forgetting, no organization, no lifecycle. engram-rs adds the part that makes memory actually useful: the ability to forget what doesn't matter and surface what does. <p align="center"> <img src="docs/engram-quickstart.gif" alt="engram demo — store, context reset, recall" width="720"> </p> Single Rust binary, one SQLite file, zero external dependencies. No Python, no Redis, no vector DB — `curl | bash` and it runs. ~10 MB binary, ~100 MB RSS, single-digit ms search latency. ## Quick Start ```bash # Install (interactive — will prompt for embedding provider config) curl -fsSL https://raw.githubusercontent.com/kael-bit/engram-rs/main/install.sh | bash # Store a memory curl -X POST http://localhost:3917/memories \ -d '{"content": "Always run tests before deploying", "tags": ["deploy"]}' # Recall by meaning curl -X POST http://localhost:3917/recall \ -d '{"query": "deployment checklist"}' # Restore full context (session start) curl http://localhost:3917/resume ``` ## What It Does ### Three-Layer Lifecycle Inspired by the [Atkinson–Shiffrin memory model](https://en.wikipedia.org/wiki/Atkinson%E2%80%93Shiffrin_memory_model), memories are managed across three layers by importance: ``` Buffer (short-term) → Working (active knowledge) → Core (long-term identity) ↓ ↓ ↑ eviction importance decay LLM quality gate ``` - **Buffer**: Entry point for all new memories. Temporary staging — evicted when below threshold - **Working**: Promoted via consolidation. Never deleted, importance decays at different rates by kind - **Core**: Promoted through LLM quality gate. Never deleted ### LLM Quality Gate Promotion isn't rule-based guesswork — an LLM evaluates each memory in context and decides whether it genuinely warrants long-term retention. ``` Buffer → [LLM gate: "Is this a decision, lesson, or preference?"] → Working Working → [sustained access + LLM gate] → Core ``` ### Automatic Decay Decay is activity-driven — it only fires during active consolidation cycles, not wall-clock time. If the system is idle, memories stay intact. **Exponential decay** follows the [Ebbinghaus forgetting curve](https://en.wikipedia.org/wiki/Forgetting_curve) — fast at first, then long-tail. Memories never fully vanish (floor = 0.01), remaining retrievable under precise queries. When a memory is recalled, it gets an **activation boost**, strengthening frequently-used knowledge. | Kind | Decay rate | Half-life | Use case | |------|-----------|-----------|----------| | `episodic` | Fastest | ~35 epochs | Events, experiences, time-bound context | | `semantic` | Medium | ~58 epochs | Knowledge, preferences, lessons (default) | | `procedural` | Slowest | ~173 epochs | Workflows, instructions, how-to | ### Algorithm Visualizations | Chart | What it shows | |-------|---------------| | <img src="docs/images/chart_scoring.png" width="600"> | **Sigmoid score compression.** Raw scores are mapped through a sigmoid function, approaching 1.0 asymptotically. High-relevance results remain distinguishable instead of being crushed into the same value. | | <img src="docs/images/chart_decay.png" width="600"> | **Ebbinghaus forgetting curve.** Exponential decay with kind-differentiated rates — episodic memories fade fastest, procedural slowest. Floor at 0.01 means memories never fully vanish; they remain retrievable under precise queries. | | <img src="docs/images/chart_bias.png" width="600"> | **Kind × layer weight bias.** Additive biases adjust memory weight by type and layer. Procedural+core memories rank highest, episodic+buffer lowest — but the spread stays bounded so no single combination dominates. | | <img src="docs/images/chart_reinforcement.png" width="600"> | **Reinforcement signals.** Repetition and access bonuses follow logarithmic saturation. Early interactions matter most; later ones contribute diminishing returns, discriminating between "used occasionally" and "used daily". | | <img src="docs/images/chart_lifecycle.png" width="600"> | **Use it or lose it.** Left: a memory that's never recalled decays into the buffer layer. Right: periodic recall triggers activation boosts that keep the memory in the working layer. Dashed line shows the unrecalled trajectory for comparison. | ### Semantic Dedup & Merge Two memories saying the same thing in different words? Detected and merged automatically: ``` "use PostgreSQL for auth" + "auth service runs on Postgres" → Merged into one, preserving context from both ``` ### Self-Organizing Topic Tree Vector clustering groups related memories together, LLM names the clusters. No manual tagging required: ``` Memory Architecture ├── Three-layer lifecycle [4] ├── Embedding pipeline [3] └── Consolidation logic [5] Deploy & Ops ├── CI/CD procedures [3] └── Production incidents [2] User Preferences [6] ``` The problem this solves: vector search requires asking the right question. Topic trees let agents **browse by subject** — scan the directory, drill into the right branch. ### Triggers Tag a memory with `trigger:deploy`, and the agent can recall all deployment lessons before executing: ```bash curl -X POST http://localhost:3917/memories \ -d '{"content": "LESSON: always backup DB before migration", "tags": ["trigger:deploy", "lesson"]}' # Pre-deployment check curl http://localhost:3917/triggers/deploy ``` ## Session Recovery Agent wakes up, calls `GET /resume`, gets full context back. No file scanning needed: ``` === Core (24) === deploy: test → build → stop → start (procedural) LESSON: never force-push to main ... === Recent === switched auth to OAuth2 published API docs === Topics (Core: 24, Working: 57, Buffer: 7) === kb1: "Deploy Procedures" [5] kb2: "Auth Architecture" [3] kb3: "Memory Design" [8] ... Triggers: deploy, git-push, database-migration ``` | Section | Content | Purpose | |---------|---------|---------| | **Core** | Full text of permanent rules and identity | The unforgettable stuff | | **Recent** | Recently changed memories | Short-term continuity | | **Topics** | Topic index (table of contents) | Drill in on demand, no full load | | **Triggers** | Pre-action tags | Auto-recall lessons before risky ops | Agent reads the directory, finds relevant topics, calls `POST /topic` to expand on demand. ## Search & Retrieval Semantic embeddings + BM25 keyword search with CJK tokenization ([jieba](https://github.com/messense/jieba-rs)). IDF-weighted scoring — rare terms get boosted, common terms auto-downweighted. No stopword lists to maintain. ```bash # Semantic search curl -X POST http://localhost:3917/recall \ -d '{"query": "how do we handle auth", "budget_tokens": 2000}' # Note: min_score defaults to 0.30. Use "min_score": 0.0 to get all results. # Topic drill-down curl -X POST http://localhost:3917/topic \ -d '{"ids": ["kb3"]}' ``` ## Background Maintenance Fully automatic, activity-driven — no writes means the cycle is skipped: **Consolidation (every 30 minutes)** 1. **Decay** — reduce importance of unaccessed memories 2. **Dedup** — merge near-identical memories (cosine > 0.78) 3. **Triage** — LLM categorizes new Buffer memories 4. **Gate** — LLM batch-evaluates promotion candidates 5. **Reconcile** — resolve ambiguous similar pairs (results cached) 6. **Topic tree rebuild** — re-cluster and name **Topic Distillation** — when a topic grows too large (10+ memories), overlapping content is condensed into fewer, richer entries. ## Namespace Isolation Single instance, multiple projects. Use `X-Namespace` to isolate: ```bash curl -X POST http://localhost:3917/memories \ -H "X-Namespace: my-project" \ -d '{"content": "API uses OAuth2 bearer tokens"}' ``` ## Install ```bash # macOS / Linux curl -fsSL https://raw.githubusercontent.com/kael-bit/engram-rs/main/install.sh | bash # Windows (PowerShell) irm https://raw.githubusercontent.com/kael-bit/engram-rs/main/install.ps1 | iex ``` ### Docker ```bash docker run -d --name engram-rs \ -p 3917:3917 \ -v engram-data:/data \ -e ENGRAM_LLM_URL=https://api.openai.com/v1 \ -e ENGRAM_LLM_KEY=sk-... \ ghcr.io/kael-bit/engram-rs:latest ``` ### LLM Configuration engram-rs works without an LLM — keyword search and rule-based lifecycle still function. Adding an LLM unlocks intelligent features (triage, gate, merge, topic naming): ```bash export ENGRAM_LLM_URL=https://api.openai.com/v1 export ENGRAM_LLM_KEY=sk-... # Any OpenAI-compatible API (Ollama, LM Studio, etc.) export ENGRAM_LLM_URL=http://localhost:11434/v1 # Anthropic native export ENGRAM_LLM_PROVIDER=anthropic export ENGRAM_LLM_KEY=sk-ant-... ``` Two-tier model config — strong model for judgment, lightweight for text processing: ```bash ENGRAM_GATE_MODEL=gpt-4o # quality decisions ENGRAM_LLM_MODEL=gpt-4o-mini # text processing ``` ### For AI Agents Add to your agent's system prompt: ``` Set up engram-rs (persistent memory) by following the guide at: https://raw.githubusercontent.com/kael-bit/engram-rs/main/docs/SETUP.md ``` ## Integration Works with Claude Code, Cursor, Windsurf, OpenClaw, and any MCP-compatible tool. 17 MCP tools ([MCP docs](docs/MCP.md)) | Full HTTP API ([Setup guide](docs/SETUP.md)) ```bash # MCP (Claude Code) npx engram-rs-mcp # MCP (Cursor / Windsurf / generic) {"mcpServers": {"engram": {"command": "npx", "args": ["-y", "engram-rs-mcp"]}}} ``` ### Web Dashboard Built-in web UI at `http://localhost:3917/ui` for browsing memories, viewing the topic tree, and monitoring LLM usage. ## Specs | | | |---|---| | Binary | ~10 MB | | Memory | ~100 MB RSS in production | | Storage | SQLite, no external database | | Language | Rust | | Platforms | Linux, macOS, Windows (x86_64 + aarch64) | | License | MIT | ## License MIT <a href="https://glama.ai/mcp/servers/@kael-bit/engram-rs"> <img width="380" height="200" src="https://glama.ai/mcp/servers/@kael-bit/engram-rs/badge" /> </a>

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kael-bit/engram-rs'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•10.8 KiB