Memory MCP Server

ARCHITECTURE.md•13.6 KiB

# Memory MCP Architecture This document describes the design decisions, architecture patterns, and rationale behind Memory MCP. ## Core Design Principle **The Engram Insight**: Frequently-used patterns should be instantly available, not searched for every time. This principle drives every architectural decision. We optimize for the common case where a small set of project knowledge is accessed repeatedly. ## Two-Tier Memory Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ Claude / LLM │ └─────────────────────────────────────────────────────────────┘ │ │ ▼ (auto-injected) ▼ (tool call) ┌───────────────────┐ ┌────────────────────────┐ │ Hot Cache │ │ Cold Storage │ │ MCP Resource │ │ Vector Database │ │ │ │ │ │ - 20 items max │◄────────────►│ - Unlimited capacity │ │ - Instant recall │ promotion │ - Semantic search │ │ - Pinnable │ / demotion │ - ~50ms lookup │ └───────────────────┘ └────────────────────────┘ ▲ ▲ │ │ └──────────── Mining Pipeline ───────┘ ``` ### Why Two Tiers? | Problem | Solution | |---------|----------| | Tool calls add latency | Hot cache injected via MCP resource (0ms) | | Large context wastes tokens | Only 20 high-value items in hot cache | | Manual memory management is tedious | Automatic promotion/demotion based on access | ### Hot Cache - **Implementation**: MCP resource at `memory://hot-cache` - **Capacity**: 20 items (configurable via `HOT_CACHE_MAX_ITEMS`) - **Eviction**: LRU with composite scoring (access count + recency) - **Pinning**: Items can be pinned to prevent eviction **Scoring Formula**: ``` hot_score = (access_weight × access_count) + (recency_weight × recency_boost) recency_boost = 0.5^(days_since_access / halflife_days) ``` ### Cold Storage - **Implementation**: SQLite with sqlite-vec extension - **Vector Dimension**: 384 (all-MiniLM-L6-v2 default) - **Search**: Cosine similarity with composite ranking **Ranking Formula**: ``` composite_score = (similarity_weight × similarity) + (recency_weight × recency_score) + (access_weight × normalized_access) ``` ## Data Flow ```mermaid graph TB subgraph Input["Input Layer"] direction LR R[remember tool] L[log_output tool] end subgraph Process["Processing"] direction TB subgraph Store["Store Path"] S[Store Memory] --> E[Embed] end subgraph Mine["Mining Path"] O[Output Log] --> MP[Extract Patterns] MP --> P{Confidence?} end end subgraph Persist["Persistence Layer"] direction LR V[(Vectors)] M[(Metadata)] end subgraph Retrieve["Retrieval"] direction TB Q[recall tool] --> VS[Vector Search] VS --> RK[Composite Rank] end subgraph Cache["Hot Cache Lifecycle"] direction LR PR[Promote] --> H[Hot Cache] H --> DE[Demote] end R --> S L --> O E --> V & M P -->|≥0.8| S P -->|<0.8| A[approve_candidate] A --> S RK --> V & M RK -->|access 3+| PR DE -->|14 days stale| V ``` ## Module Structure ``` src/memory_mcp/ ├── server/ # MCP server package │ ├── __init__.py # Exports main(), mcp app │ ├── app.py # FastMCP setup, resources, lifespan │ └── tools/ # Tool implementations by domain │ ├── cold_storage.py # remember, recall, forget, list │ ├── hot_cache.py # promote, demote, pin, unpin │ ├── mining.py # log_output, run_mining, review │ ├── seeding.py # seed_from_text, bootstrap_project │ ├── trust.py # validate_memory, invalidate_memory │ ├── relationships.py # link/unlink_memories, get_related │ ├── contradictions.py # find/mark/resolve contradictions │ ├── sessions.py # session lifecycle tools │ ├── maintenance.py # db_maintenance, cleanup, info │ ├── predictions.py # access_patterns, predict_next │ ├── retrieval.py # mark_memory_used, quality stats │ └── consolidation.py # preview/run consolidation ├── storage/ # Storage package │ ├── __init__.py # Exports Storage class │ ├── core.py # Storage class, transactions, schema │ ├── memory_crud.py # store_memory, delete_memory, get_memory │ ├── search.py # recall, vector search, scoring │ ├── hot_cache.py # promote, demote, eviction logic │ ├── trust.py # adjust_trust, get_trust_history │ ├── relationships.py # link_memories, knowledge graph │ ├── contradictions.py # find/resolve contradictions │ ├── sessions.py # session lifecycle, episodic memory │ ├── mining_store.py # pattern storage operations │ ├── consolidation.py # memory merging logic │ ├── retrieval.py # RAG quality tracking │ ├── maintenance.py # cleanup, rebuild, vacuum │ ├── audit.py # audit history logging │ ├── seeding.py # bootstrap, seed operations │ ├── output_logging.py # output log for mining │ └── salience.py # salience score computation ├── mining.py # Pattern extraction algorithms ├── config.py # Settings with environment variable loading ├── cli.py # CLI commands for hooks and administration ├── embeddings.py # Embedding providers (sentence-transformers, MLX) ├── responses.py # Pydantic response models for MCP tools ├── models.py # Enums and dataclasses (domain models) ├── migrations.py # Database schema and version migrations ├── helpers.py # Helper functions for tools ├── text_parsing.py # Content chunking for seeding ├── logging.py # Structured logging configuration └── metrics.py # Metrics collection and observability ``` ### Module Dependencies ```mermaid graph TB subgraph API["API Layer"] direction LR server[server/] cli[cli.py] end subgraph Core["Core Layer"] direction LR storage[storage/] mining[mining.py] end subgraph Support["Support Layer"] direction LR helpers[helpers.py] responses[responses.py] embeddings[embeddings.py] end subgraph Foundation["Foundation Layer"] direction LR models[models.py] config[config.py] migrations[migrations.py] logging[logging.py] metrics[metrics.py] end server --> storage & helpers & responses & logging cli --> storage & config storage --> models & migrations & embeddings & config mining --> storage helpers --> models & responses responses --> models metrics --> logging ``` ### Layer Responsibilities | Module | Responsibility | |--------|----------------| | `server/` | MCP tools/resources, request validation, response construction | | `storage/` | Data persistence, vector operations, business logic | | `responses.py` | Pydantic models for tool return types | | `models.py` | Domain enums (MemoryType, TrustReason) and dataclasses (Memory, Session) | | `migrations.py` | Database schema definition and version migrations | | `mining.py` | Pattern extraction algorithms | | `embeddings.py` | Embedding abstraction (MLX on Apple Silicon, else ST) | | `logging.py` | Loguru configuration, stderr output for MCP compatibility | | `metrics.py` | Counters, gauges, and metric recording helpers | ### Why This Structure? 1. **Single responsibility**: Each mixin handles one domain (trust, search, hot cache, etc.) 2. **Thin server layer**: Tool modules adapt storage operations to MCP protocol 3. **Swappable embeddings**: Backend detection allows hardware optimization 4. **Backwards-compatible imports**: Package `__init__.py` re-exports preserve existing import paths ## Key Design Decisions ### 1. SQLite + sqlite-vec Over Vector Databases **Chose**: SQLite with sqlite-vec extension **Alternatives considered**: Pinecone, Weaviate, Milvus, Chroma **Rationale**: - **Local-first**: No network dependency, instant startup - **Simplicity**: Single file database, easy backup/restore - **Scale**: Handles thousands of memories efficiently - **Portability**: Works everywhere Python works **Trade-off**: Limited to single-user, single-machine scenarios ### 2. Sentence Transformers for Embeddings **Chose**: all-MiniLM-L6-v2 (384 dimensions) **Alternatives considered**: OpenAI embeddings, larger BERT models **Rationale**: - **Local**: No API calls, no costs, offline capable - **Fast**: ~50ms per embedding on CPU - **Small**: 90MB model, 384 dimensions - **Quality**: Good semantic similarity for short text **Trade-off**: Less powerful than larger models, but sufficient for memory retrieval ### 3. Automatic Promotion/Demotion **Chose**: Frequency-based with configurable thresholds **Design**: - Promote after 3 accesses (configurable) - Demote after 14 days without access (configurable) - Evict lowest-scored when cache full **Rationale**: Reduces manual intervention while ensuring hot cache reflects actual usage ### 4. Pattern Mining Pipeline **Chose**: Regex-based extraction with occurrence counting **Extracted patterns**: - Python imports - Shell commands (npm, git, docker, etc.) - Project facts ("This project uses X") - Code blocks from markdown **Auto-approval**: Patterns with confidence ≥ 0.8 and occurrences ≥ 3 are automatically promoted **Rationale**: Low overhead, high precision for common patterns ### 5. Trust Score System **Chose**: Decay-based trust with explicit validation/invalidation **Design**: - Manual memories start at 1.0 trust - Mined memories start at 0.7 trust - Trust decays over time (type-specific half-lives) - Explicit validation/invalidation adjusts trust **Rationale**: Older information naturally becomes less reliable; explicit feedback accelerates correction ### 6. Knowledge Graph **Chose**: Simple typed relationships stored in SQLite **Relation types**: - `relates_to`: General association - `depends_on`: Prerequisite - `supersedes`: Replaces older info - `refines`: More specific version - `contradicts`: Conflicting information - `elaborates`: More detail **Rationale**: Enables context expansion during recall without complex graph database ### 7. Predictive Hot Cache **Chose**: Markov chain of access sequences **Design**: - Record (memory_A, memory_B) when accessed together - Build transition probabilities - Pre-warm cache with predicted next memories after each recall **Rationale**: Learns actual usage patterns without explicit configuration ## Configuration Philosophy **Principle**: Maximum value with minimal configuration Default settings are optimized for immediate value: - Predictive cache: Enabled by default - Auto-approve mining: Enabled with conservative thresholds - Auto-promote/demote: Enabled Power users can tune via environment variables, but sensible defaults mean most users never need to. ## Error Handling Strategy 1. **Graceful degradation**: If embedding fails, memory still stored without vector 2. **Idempotent operations**: Duplicate stores merge tags and increment access 3. **Audit logging**: Destructive operations logged for recovery 4. **No silent failures**: Errors returned in response, not swallowed ## Performance Characteristics | Operation | Typical Latency | Notes | |-----------|-----------------|-------| | Hot cache read | 0ms | Auto-injected by MCP | | remember() | 50-100ms | Embedding dominates | | recall() | 50-150ms | Vector search + ranking | | Mining (24h) | 1-5s | Depends on log volume | | Bootstrap | 2-10s | Depends on file count | ## Future Considerations ### Considered but Deferred 1. **Multi-user support**: Would require auth, permissions, conflict resolution 2. **Distributed storage**: Current single-file model sufficient for local use 3. **Custom embedding models**: Current model adequate; adding complexity not justified 4. **LLM-based extraction**: Regex patterns sufficient; LLM adds latency and cost ### Potential Enhancements 1. **Semantic deduplication**: Merge near-duplicate memories automatically 2. **Memory aging**: Archive very old, unused memories 3. **Cross-project sharing**: Share patterns across projects 4. **Embedding model migration**: Tools to rebuild vectors when changing models ## References - [Engram](https://github.com/AnswerDotAI/engram) - Inspiration for hot cache concept - [sqlite-vec](https://github.com/asg017/sqlite-vec) - Vector search extension - [FastMCP](https://github.com/jlowin/fastmcp) - MCP server framework - [sentence-transformers](https://www.sbert.net/) - Embedding models

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/michael-denyer/memory-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ARCHITECTURE.md•13.6 KiB