Delia

Overview Schema Related Servers Score Discussions

007-conversation-compaction.md•4.41 KiB

# ADR-007: Conversation Compaction System ## Status Accepted ## Context Long-running Delia sessions accumulate conversation history that can exceed model context windows, leading to: - Degraded response quality as important context gets truncated - Increased latency from processing large contexts - Memory pressure from storing extensive message histories - Loss of early task context when truncation occurs Claude Code handles this elegantly with its `/compact` command, which summarizes older messages while preserving recent context and key information. ## Decision Implement a conversation compaction system for Delia that: 1. **LLM-Based Summarization**: Use the `quick` model tier to generate concise summaries of older messages, preserving: - Main task/goal being worked on - Key decisions and their reasoning - File/code modifications made - Problems encountered and solutions - Current state and progress 2. **Key Information Extraction**: Automatically detect and preserve: - Tool calls (file operations, commands, etc.) - File modifications - Important decisions 3. **Threshold-Based Triggers**: - Warning at 12,000 tokens (configurable) - Auto-compaction at 18,000 tokens (1.5x threshold) 4. **Message Preservation**: Always keep the last N message pairs (default: 6 pairs = 12 messages) untouched for immediate context 5. **MCP Tools**: - `session_compact`: Manual compaction trigger - `session_stats`: View compaction statistics ## Implementation ### Core Components - **`src/delia/semantic/compression.py`** - Main compaction logic: - `ConversationCompactor` class with configurable thresholds - `CompactionResult` dataclass for operation results - `CompactionMetadata` for tracking compaction history - Key info extraction via regex patterns **SessionManager Integration**: - `SessionState.needs_compaction()` - Check if session needs compaction - `SessionState.get_compaction_stats()` - Get token counts and status - `SessionManager.compact_session()` - Async compaction method **Orchestration Integration** (`service.py`): - `_check_auto_compaction()` - Check and optionally auto-compact - Warning events for large contexts - Automatic compaction above critical threshold ### Compaction Algorithm ``` 1. Count total tokens in session messages 2. If tokens > threshold: a. Identify messages to compact (all except last N pairs) b. Extract key information (tool calls, file mods, decisions) c. Generate LLM summary of compactable messages d. Replace compacted messages with single summary message e. Store compaction metadata in session 3. Save updated session to disk ``` ### Key Information Patterns ```python TOOL_CALL_PATTERNS = [ r"\b(created|wrote|edited)\s+file\s+...", r"\b(running|executed)\s+command\s+...", r"\b(pip|npm|git|docker)\s+\w+", ] DECISION_PATTERNS = [ r"(?:decided|choosing|will use)\s+...", r"(?:approach|strategy|plan):\s*...", ] ``` ## Consequences ### Positive - Prevents context overflow in long sessions - Preserves important information across compaction - Matches Claude Code's user-friendly `/compact` workflow - Automatic protection against context overflow - Configurable thresholds for different use cases ### Negative - Summarization may lose some nuance from original messages - Adds latency during compaction (LLM call required) - Storage overhead for compaction metadata ### Trade-offs - Quality vs. compression: Higher compression ratios may lose more detail - Threshold tuning: Too low = frequent compaction, too high = context issues ## Configuration ```python DEFAULT_COMPACTION_THRESHOLD_TOKENS = 12000 # Warning threshold DEFAULT_PRESERVE_RECENT_MESSAGES = 6 # Message pairs to keep DEFAULT_SUMMARY_MAX_TOKENS = 2000 # Max summary length ``` ## Usage Examples ### Manual Compaction (MCP) ```python # Check if compaction is needed session_stats(session_id="abc-123") # Returns: {"total_tokens": 15000, "needs_compaction": true, ...} # Compact the session session_compact(session_id="abc-123") # Returns: {"success": true, "messages_compacted": 20, "tokens_saved": 8000, ...} ``` ### Automatic Compaction When processing a message and tokens exceed 1.5x threshold: ``` StreamEvent(type="compaction", message="Session auto-compacted...") ``` ## Related ADRs - ADR-001: Singleton Architecture (SessionManager singleton) - ADR-002: MCP-Native Paradigm (session tools) - ADR-003: Centralized LLM Calling (summarization via call_llm)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zbrdc/delia'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

007-conversation-compaction.md•4.41 KiB