Massive Context MCP

SKILL.md•11.1 KiB

# RLM (Recursive Language Model) Skill ## Overview The RLM pattern enables processing massive contexts (10M+ tokens) that exceed Claude's context window by recursively chunking, processing, and aggregating results. Instead of failing on large files, use RLM to break them into manageable pieces. ## When to Use RLM Use RLM when you encounter: - **Large files**: Any file >100KB or >2000 lines - **Multi-file analysis**: Processing multiple files together (combined size matters) - **Context exceeded**: User asks to analyze content that won't fit in context window - **Aggregation tasks**: Summarizing logs, finding patterns across large datasets, counting/filtering operations - **Deep codebase analysis**: Understanding architecture across many files - **Document processing**: Analyzing reports, research papers, documentation sets **Don't use RLM for:** - Small files (<100KB) - Single-pass tasks that fit in context - Interactive editing (use standard tools) ## The RLM Pattern The core workflow is: **Load → Inspect → Chunk → Sub-Query → Aggregate** ### Step 1: Load Context ```bash # Load large content into RLM memory rlm_load_context( name="codebase", content=file_contents # Full file content ) ``` Returns: `{name, size_bytes, size_chars, line_count, loaded: true}` ### Step 2: Inspect Context ```bash # Understand structure without loading into prompt rlm_inspect_context( name="codebase", preview_chars=500 # Optional preview ) ``` Returns: Metadata + preview (first N chars) ### Step 3: Chunk Context ```bash # Break into manageable pieces rlm_chunk_context( name="codebase", strategy="lines", # or "chars" or "paragraphs" size=100 # Lines per chunk (or chars if strategy=chars) ) ``` **Chunking Strategies:** - `lines` (default): Split by line count - best for code, logs, structured data - `chars`: Split by character count - best for prose, unstructured text - `paragraphs`: Split by blank lines - best for documents, markdown Returns: `{name, chunk_count, strategy, size_per_chunk}` ### Step 4: Sub-Query (Process Chunks) **Single chunk:** ```bash rlm_sub_query( context_name="codebase", chunk_index=0, query="Extract all function names", provider="claude-sdk" # or "ollama" ) ``` **Batch processing (parallel):** ```bash rlm_sub_query_batch( context_name="codebase", chunk_indices=[0, 1, 2, 3], query="Extract all function names", provider="claude-sdk", concurrency=4 # Max parallel requests (max: 8) ) ``` Returns: Array of results, one per chunk ### Step 5: Store Results (Optional) ```bash # Store intermediate results for later aggregation rlm_store_result( name="function_names", result=sub_query_response, metadata={"chunk": 0} # Optional ) ``` ### Step 6: Aggregate ```bash # Retrieve all stored results rlm_get_results(name="function_names") ``` Then synthesize final answer from all chunk results. ## Provider Options ### claude-sdk (Default - Recommended) - Model: **Haiku 4.5** (fast, accurate, cost-effective) - Cost: ~$0.25 per 1M input tokens - Best for: Most tasks requiring accuracy - Usage: `provider="claude-sdk"` ### ollama (Local, Free) - Model: User's local Ollama instance - Cost: Free (runs on your hardware) - Best for: Experimentation, privacy-sensitive data, budget constraints - Usage: `provider="ollama"` **Choosing a Provider:** - Default to `claude-sdk` for production tasks - Use `ollama` when cost/privacy is primary concern - Haiku 4.5 is fast enough for batch processing ## Example Workflows ### Workflow 1: Analyze Large Codebase **Task:** Find all TODO comments across 50 Python files ```python # 1. Read all files and combine all_code = "" for file in python_files: all_code += read_file(file) # 2. Load into RLM rlm_load_context(name="codebase", content=all_code) # 3. Inspect to understand size metadata = rlm_inspect_context(name="codebase") # Shows: 15,000 lines, 500KB # 4. Chunk by lines (code is line-oriented) rlm_chunk_context( name="codebase", strategy="lines", size=200 # 200 lines per chunk ) # Result: 75 chunks # 5. Process in batches for batch_start in range(0, 75, 4): batch_indices = list(range(batch_start, min(batch_start+4, 75))) results = rlm_sub_query_batch( context_name="codebase", chunk_indices=batch_indices, query="Extract all TODO comments with line context", concurrency=4 ) # 6. Store results for i, result in enumerate(results): rlm_store_result( name="todos", result=result, metadata={"chunk": batch_start + i} ) # 7. Aggregate all_results = rlm_get_results(name="todos") # Synthesize final TODO list ``` ### Workflow 2: Process Large Log File **Task:** Summarize errors from 10MB log file ```python # 1. Load logs logs = read_file("/var/log/app.log") rlm_load_context(name="logs", content=logs) # 2. Chunk by lines (logs are line-oriented) rlm_chunk_context(name="logs", strategy="lines", size=500) # 3. Filter to error lines only rlm_filter_context( name="logs", output_name="errors", pattern=r"ERROR|CRITICAL", mode="keep" ) # 4. Chunk filtered results rlm_chunk_context(name="errors", strategy="lines", size=100) # 5. Batch process errors chunk_metadata = rlm_inspect_context(name="errors") num_chunks = chunk_metadata["chunk_count"] all_indices = list(range(num_chunks)) results = rlm_sub_query_batch( context_name="errors", chunk_indices=all_indices, query="Group errors by type and count occurrences", concurrency=8 ) # 6. Aggregate error summary # Synthesize from results array ``` ### Workflow 3: Multi-Document Q&A **Task:** Answer questions from 20 research papers ```python # 1. Load all papers combined_docs = "\n\n=== DOCUMENT BREAK ===\n\n".join(papers) rlm_load_context(name="research", content=combined_docs) # 2. Chunk by paragraphs (prose is paragraph-oriented) rlm_chunk_context(name="research", strategy="paragraphs", size=50) # 3. Ask question across all chunks results = rlm_sub_query_batch( context_name="research", chunk_indices=list(range(chunk_count)), query="Does this section mention climate change impacts on agriculture? If yes, summarize key points.", concurrency=8 ) # 4. Filter relevant results relevant = [r for r in results if "yes" in r.lower()] # 5. Final synthesis # Combine relevant excerpts into answer ``` ## Tool Reference ### rlm_load_context Load large content into RLM memory without consuming context window. - `name`: Identifier for this context - `content`: Full text content to load ### rlm_inspect_context Get metadata and preview without loading full content. - `name`: Context identifier - `preview_chars`: Number of characters to preview (default: 500) ### rlm_chunk_context Split context into manageable chunks. - `name`: Context identifier - `strategy`: `"lines"`, `"chars"`, or `"paragraphs"` - `size`: Chunk size (meaning depends on strategy) ### rlm_get_chunk Retrieve specific chunk by index. - `name`: Context identifier - `chunk_index`: Zero-based chunk index ### rlm_filter_context Filter context using regex, creates new filtered context. - `name`: Source context identifier - `output_name`: Name for filtered context - `pattern`: Regex pattern to match - `mode`: `"keep"` (keep matches) or `"remove"` (remove matches) ### rlm_sub_query Process single chunk with sub-LLM call. - `context_name`: Context identifier - `query`: Question/instruction for sub-call - `chunk_index`: Optional specific chunk (otherwise uses whole context) - `provider`: `"claude-sdk"` or `"ollama"` - `model`: Optional model override ### rlm_sub_query_batch Process multiple chunks in parallel (recommended). - `context_name`: Context identifier - `query`: Question/instruction for each chunk - `chunk_indices`: Array of chunk indices to process - `provider`: `"claude-sdk"` or `"ollama"` - `concurrency`: Max parallel requests (default: 4, max: 8) ### rlm_store_result Store sub-call result for later aggregation. - `name`: Result set identifier - `result`: Result content to store - `metadata`: Optional metadata about result ### rlm_get_results Retrieve all stored results for aggregation. - `name`: Result set identifier ### rlm_list_contexts List all loaded contexts and their metadata. ## Best Practices ### Chunking Strategy Selection - **Code/logs/CSV**: Use `lines` (structured, line-oriented) - **Prose/articles**: Use `paragraphs` (semantic boundaries) - **Unstructured text**: Use `chars` (uniform distribution) ### Chunk Size Guidelines - **Lines**: 100-500 (balance between context and granularity) - **Chars**: 2000-10000 (roughly 500-2500 tokens) - **Paragraphs**: 20-100 (depends on paragraph length) ### Efficient Processing 1. **Use batch processing**: `rlm_sub_query_batch` is much faster than sequential calls 2. **Set appropriate concurrency**: 4-8 parallel requests balances speed and resource usage 3. **Filter before chunking**: Use `rlm_filter_context` to reduce data volume 4. **Inspect first**: Always check context size before chunking ### Cost Optimization - Use `claude-sdk` (Haiku 4.5) for most tasks - fast and cheap - Use `ollama` for experimentation or when processing very large volumes - Filter contexts before processing to reduce token usage - Chunk at appropriate granularity (bigger chunks = fewer calls) ## Common Patterns ### Map-Reduce Pattern ```python # Map: Process each chunk results = rlm_sub_query_batch( context_name="data", chunk_indices=all_indices, query="Extract key information", concurrency=8 ) # Reduce: Aggregate results final = synthesize(results) ``` ### Filter-Process Pattern ```python # Filter to relevant content rlm_filter_context( name="all_logs", output_name="errors", pattern="ERROR", mode="keep" ) # Process filtered content results = rlm_sub_query_batch( context_name="errors", chunk_indices=all_indices, query="Categorize error type" ) ``` ### Hierarchical Processing Pattern ```python # First pass: Summarize each chunk summaries = rlm_sub_query_batch( context_name="docs", chunk_indices=all_indices, query="Summarize key points" ) # Second pass: Aggregate summaries rlm_load_context(name="summaries", content="\n".join(summaries)) final = rlm_sub_query( context_name="summaries", query="Create overall summary from chunk summaries" ) ``` ## Troubleshooting ### "Context too large" errors - You're trying to process chunks that are still too big - Solution: Reduce chunk size or filter content first ### Slow processing - Sequential sub-queries instead of batch - Solution: Use `rlm_sub_query_batch` with appropriate concurrency ### Poor aggregation quality - Chunks too small (losing context) - Solution: Increase chunk size to maintain semantic coherence ### High costs - Using wrong provider or inefficient chunking - Solution: Use Haiku 4.5 (`claude-sdk`) and filter before processing ## Summary RLM unlocks massive context processing for Claude Code: - ✅ Handle files >100KB easily - ✅ Process multiple files together - ✅ Parallelize for speed (batch processing) - ✅ Cost-effective with Haiku 4.5 - ✅ Flexible chunking strategies - ✅ Map-reduce pattern for aggregation **Default workflow:** Load → Inspect → Chunk (lines, 200) → Batch Sub-Query (claude-sdk, concurrency=4) → Aggregate

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/egoughnour/massive-context-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SKILL.md•11.1 KiB