# Recall - Long-term Memory System for MCP Clients
## Project Overview
Recall is an MCP (Model Context Protocol) server that provides persistent memory storage and retrieval with semantic search capabilities. It enables MCP-compatible clients (Claude Code, Claude Desktop, Cursor, etc.) to remember user preferences, decisions, patterns, and session context across conversations.
## Quick Start
```bash
# Run as MCP server
uv run python -m recall
# Show help
uv run python -m recall --help
# Run tests
uv run pytest tests/
```
## Architecture
### Storage Layer (Hybrid Storage)
- **SQLite**: Structured metadata, memory records, relationships (edges)
- **ChromaDB**: Vector embeddings for semantic search
- **Embedding Factory**: Pluggable embedding backends (MLX for Apple Silicon, Ollama fallback)
### Core Components
```
src/recall/
├── __init__.py # Package entry point
├── __main__.py # MCP server with FastMCP tools
├── config.py # Pydantic Settings configuration
├── validation.py # Contradiction detection, auto-supersede
├── memory/
│ ├── types.py # Memory, Edge, MemoryType, RelationType, confidence
│ └── operations.py # store, recall, relate, context, forget, validate, apply, outcome
├── storage/
│ ├── sqlite.py # SQLite store implementation
│ ├── chromadb.py # ChromaDB store implementation
│ └── hybrid.py # Coordinated SQLite + ChromaDB layer
└── embedding/
├── provider.py # Abstract embedding provider interface
├── factory.py # Backend selection (MLX vs Ollama)
├── mlx_provider.py # MLX embeddings (Apple Silicon, ~5-10x faster)
└── ollama.py # Ollama embedding client (fallback)
```
### Data Types
**MemoryType** (enum):
- `preference` - User preferences or settings
- `decision` - Design or implementation decisions
- `pattern` - Recognized patterns or recurring behaviors
- `session` - Session-related information
- `file_context` - File activity tracking (what files were touched)
- `golden_rule` - High-confidence memories (constitutional principles)
**RelationType** (enum):
- `relates_to` - General relationship
- `supersedes` - One memory replaces another
- `caused_by` - Causal relationship
- `contradicts` - Conflicting information
**Namespace** format:
- `global` - Cross-project memories
- `project:{name}` - Project-scoped memories
**Confidence Score**:
- Range: 0.0 to 1.0 (default: 0.3)
- Validated through usage via the validation loop
- Success increases confidence, failure decreases it
- Memories at confidence >= 0.9 are golden rules
### Golden Rules
Golden rules are high-confidence memories that represent validated, constitutional principles:
- **Auto-promotion**: Memories with confidence >= 0.9 are automatically promoted to `golden_rule` type
- **Eligible types**: Only `preference`, `decision`, and `pattern` can be promoted
- **Protected**: Golden rules cannot be deleted unless `force=True` is specified
- **Always visible**: Appear in context regardless of token budget or recency
- **Original type preserved**: Stored in `metadata.promoted_from`
## RFC 2119 Requirement Language
Memories in Recall follow [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119) semantics for requirement keywords. When writing memories, use these keywords with their precise meanings:
| Keyword | Meaning | Use For |
|---------|---------|---------|
| **MUST** / **REQUIRED** / **SHALL** | Absolute requirement | Hard rules that cannot be ignored |
| **MUST NOT** / **SHALL NOT** | Absolute prohibition | Actions that are never allowed |
| **SHOULD** / **RECOMMENDED** | Strong suggestion, exceptions require justification | Preferences with valid escape hatches |
| **SHOULD NOT** / **NOT RECOMMENDED** | Strong discouragement | Generally avoid unless justified |
| **MAY** / **OPTIONAL** | Truly optional | Nice-to-haves |
### Writing Effective Memories
**Weak (avoid):**
```
User prefers pytest over unittest
```
**Strong (preferred):**
```
You MUST use pytest for all tests. You MUST NOT use unittest.
```
### Why This Matters
Soft language ("prefers", "wants", "likes") gets rationalized and ignored under time pressure. RFC 2119 keywords have defined semantics that are harder to bypass.
The `memory_context_tool` automatically injects an RFC 2119 preamble so the consuming LLM knows these keywords have precise meanings.
## MCP Tools
The server exposes 17 tools via FastMCP:
### memory_store_tool
Store a new memory with semantic indexing and deduplication.
### memory_recall_tool
Recall memories using semantic search with optional graph expansion.
### memory_relate_tool
Create a relationship between two memories.
### memory_context_tool
Fetch and format relevant memories for context injection.
### memory_forget_tool
Delete memories by ID or semantic search. Golden rules are protected from deletion.
### memory_validate_tool
Validate a memory and adjust its confidence score based on success/failure.
### memory_apply_tool
Record that a memory is being applied (TRY phase of validation loop).
### memory_outcome_tool
Record the outcome of applying a memory (LEARN phase of validation loop).
### memory_count_tool
Count memories with optional namespace and type filters.
### memory_list_tool
List memories with filtering and pagination support.
### validation_history_tool
Get validation event history for a memory to understand confidence changes.
### file_activity_add
Record file activity events (used by PostToolUse hooks).
### file_activity_recent
Get recently accessed files with aggregated activity.
### daemon_status_tool
Check if the recall daemon is running and healthy. Returns daemon status including uptime, queue stats, and cache stats.
## Daemon Integration (Fast Path)
The MCP server integrates with an optional background daemon for fast memory storage:
```
┌─────────────────────────────────────────────────────────────────┐
│ memory_store_tool() │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Daemon socket exists?│ │
│ └──────────┬──────────┘ │
│ YES │ NO │
│ │ └────────────────────┐ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ FAST PATH │ │ SYNC PATH │ │
│ │ Queue to daemon │ │ MLX: <100ms │ │
│ │ <10ms response │ │ Ollama: 10-60s │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
### Daemon Components
- **Unix Socket**: `/tmp/recall-daemon.sock` for fast IPC
- **StoreQueue**: SQLite-backed queue for pending memories
- **EmbedWorker**: Background embedding processing (MLX on Apple Silicon, Ollama fallback)
- **ClassificationWorker**: Async LLM classification for edge types
### launchd Service (macOS)
The daemon runs as a launchd service for automatic startup.
**Quick Install:**
```bash
./hooks/install-daemon.sh
```
**Manual Install:**
```bash
# Install plist with path substitution
sed "s|{{HOME}}|$HOME|g; s|{{RECALL_DIR}}|$(pwd)|g" \
hooks/com.recall.daemon.plist.template > ~/Library/LaunchAgents/com.recall.daemon.plist
# Load the daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist
```
**Service Control:**
```bash
# Service location
~/Library/LaunchAgents/com.recall.daemon.plist
# Manual control
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist # Start
launchctl unload ~/Library/LaunchAgents/com.recall.daemon.plist # Stop
launchctl kickstart -k gui/$(id -u)/com.recall.daemon # Restart
```
### Performance
| Path | Response Time | Use Case |
|------|---------------|----------|
| Fast (daemon) | <10ms | Normal operation when daemon running |
| Sync + MLX | <100ms | Apple Silicon without daemon |
| Sync + Ollama | 10-60s | Non-Apple platforms without daemon |
### Memory Management
The daemon includes automatic memory management to prevent leaks:
| Component | Interval | Purpose |
|-----------|----------|---------|
| **GC Loop** | 5 min | Forces `gc.collect()` and clears MLX metal cache |
| **Memory Watchdog** | 5 min | Triggers graceful restart if RSS > 500MB |
| **Queue Cleanup** | 1 hour | Removes completed queue entries older than 24h |
| **Cache Size Limit** | On write | LRU eviction when cache exceeds 50 namespaces |
**Configuration** (in `recall-daemon.py`):
```python
GC_INTERVAL_SECONDS = 300 # 5 minutes
MEMORY_WATCHDOG_INTERVAL_SECONDS = 300
MEMORY_WATCHDOG_MAX_RSS_MB = 500 # Restart threshold
QUEUE_CLEANUP_INTERVAL_SECONDS = 3600
QUEUE_CLEANUP_AGE_HOURS = 24
MAX_CACHE_ENTRIES = 50 # LRU cache limit
```
**MLX Memory**: On Apple Silicon, the daemon clears MLX's metal cache after each embedding batch to prevent GPU memory accumulation.
## Validation Loop (ELF-Inspired)
Recall implements a validation loop to build confidence in memories through practical use:
```
TRY → BREAK → ANALYZE → LEARN
↑ ↓
└────────────────────────┘
```
### Workflow
1. **TRY** - Apply a memory using `memory_apply_tool`
- Records an "applied" validation event
- Updates access timestamp
2. **BREAK** - Memory application either succeeds or fails
- Success: Memory was useful and correct
- Failure: Memory led to errors or was rejected
3. **ANALYZE** - Evaluate what happened
- Collect error messages and context
- Determine if memory was helpful
4. **LEARN** - Update confidence using `memory_outcome_tool`
- Success: `confidence += 0.1` (max 1.0)
- Failure: `confidence -= 0.15` (min 0.0)
- Auto-promote to golden rule at confidence >= 0.9
### Example Usage
```python
# 1. TRY: Apply a memory
result = await memory_apply_tool(
memory_id="mem_123",
context="Using dark mode preference for UI settings",
session_id="session_456"
)
# 2. BREAK: Attempt to use the memory
# ... application code ...
# 3. ANALYZE: Check if it worked
success = (error_count == 0)
# 4. LEARN: Record the outcome
outcome = await memory_outcome_tool(
memory_id="mem_123",
success=success,
error_msg="User rejected setting" if not success else None,
session_id="session_456"
)
# Check if promoted to golden rule
if outcome["promoted"]:
print(f"Memory promoted to golden rule! Confidence: {outcome['new_confidence']}")
```
### Contradiction Detection
The validation system also detects contradictions between memories:
- **Semantic similarity**: ChromaDB finds similar memories (threshold: 0.7)
- **LLM reasoning**: Ollama determines if they actually contradict
- **Edge creation**: `CONTRADICTS` edges link conflicting memories
- **Auto-supersede**: Better-performing memories replace worse ones
### File Activity Tracking
Files accessed during tool operations are automatically tracked:
- **Action types**: `read`, `write`, `edit`, `multiedit`
- **Project context**: Grouped by project root directory
- **File type detection**: Automatically inferred from extension
- **Recent files**: Query by project and time window
## Configuration
Settings are loaded via Pydantic Settings with `RECALL_` prefix:
| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| RECALL_SQLITE_PATH | ~/.recall/recall.db | SQLite database path |
| RECALL_CHROMA_PATH | ~/.recall/chroma_db | ChromaDB storage path |
| RECALL_COLLECTION_NAME | memories | ChromaDB collection name |
| RECALL_EMBEDDING_BACKEND | ollama | Embedding backend: `mlx` or `ollama` |
| RECALL_MLX_MODEL | mlx-community/mxbai-embed-large-v1 | MLX embedding model |
| RECALL_OLLAMA_HOST | http://localhost:11434 | Ollama server URL |
| RECALL_OLLAMA_MODEL | mxbai-embed-large | Ollama embedding model |
| RECALL_OLLAMA_TIMEOUT | 30 | Request timeout (seconds) |
| RECALL_LOG_LEVEL | INFO | Logging level |
| RECALL_DEFAULT_NAMESPACE | global | Default namespace |
| RECALL_DEFAULT_IMPORTANCE | 0.5 | Default importance score |
| RECALL_DEFAULT_CONFIDENCE | 0.3 | Default confidence score |
| RECALL_DEFAULT_TOKEN_BUDGET | 4000 | Default token budget |
## Testing
```bash
# Run all tests
uv run pytest tests/
# Run with coverage
uv run pytest tests/ --cov=recall
# Run specific test file
uv run pytest tests/integration/test_mcp_server.py
# Run specific test class
uv run pytest tests/integration/test_mcp_server.py::TestMCPToolHandlers
```
## Development Patterns
### Async Operations
All memory operations (store, recall, forget, context, validate, apply, outcome) are async. Use `await` when calling.
### Error Handling
Operations return result objects with `success` boolean and optional `error` message.
### Deduplication
Content is hashed (SHA-256, truncated to 16 chars) for deduplication within namespaces.
### Graph Expansion
Set `include_related=True` in recall to follow relationship edges.
### Confidence Building
- Memories start at confidence 0.3 by default
- Use the validation loop to build confidence through practical application
- Golden rules (confidence >= 0.9) gain special protection and visibility
- Low-confidence memories (< 0.15) are candidates for deletion
## Important Notes
- **STDIO Transport**: MCP uses stdio - all logging goes to stderr, never stdout
- **Embedding Backends**:
- **MLX** (Apple Silicon): Uses `mlx-embeddings` package for ~5-10x faster embeddings
- **Ollama** (fallback): Requires `mxbai-embed-large` model
- `llama3.2` model for contradiction detection and auto-supersede (optional)
- **Signal Handling**: SIGINT/SIGTERM trigger graceful shutdown
- **Python 3.13+**: Requires Python 3.13 or later
- **Golden Rule Protection**: Golden rules cannot be deleted without `force=True` flag