Memory Forensics MCP Server

MULTI_LLM_GUIDE.md•11.3 KiB

# Multi-LLM Setup Guide This guide explains how to use the Memory Forensics MCP Server with different LLMs (Large Language Models), including Claude, Llama (via Ollama), GPT-4, and others. ## Overview The Memory Forensics MCP Server is **LLM-agnostic** - it communicates via the Model Context Protocol (MCP), which any compatible client can use. The server doesn't care which LLM you use; it just exposes memory forensics tools via a standard interface. **What this means:** - Same server code works with all LLMs - Switch between LLMs by changing the client - Use Claude for production, Llama for offline/confidential work - Optimize tool descriptions per LLM capability ## Supported LLM Profiles The server supports different LLM profiles that optimize tool descriptions: | Profile | Use Case | Output Format | Description Length | |---------|----------|---------------|-------------------| | `claude` | Claude Opus/Sonnet | Markdown | Detailed (500 chars) | | `llama70b` | Llama 3.1 70B+ | Markdown | Moderate (300 chars) | | `llama13b` | Llama 13B or smaller | JSON | Concise (150 chars) | | `gpt4` | GPT-4/GPT-4 Turbo | Markdown | Detailed (500 chars) | | `minimal` | Any small model | JSON | Minimal (100 chars) | Set the profile via environment variable: ```bash export MCP_LLM_PROFILE=llama70b python server.py ``` --- ## Option 1: Claude (via Claude Code) **Best for:** Production use, highest quality analysis ### Setup 1. **Install Claude Code:** ```bash # Download from https://claude.com/claude-code ``` 2. **Configure MCP Server:** Edit `~/.claude/mcp.json`: ```json { "mcpServers": { "memory-forensics": { "command": "/home/robb/tools/memory-forensics-mcp/venv/bin/python", "args": ["/home/robb/tools/memory-forensics-mcp/server.py"] } } } ``` 3. **Start Claude Code:** ```bash claude ``` 4. **Use Tools:** ``` You: "List available memory dumps" Claude: [calls list_dumps tool automatically] You: "Analyze Win11Dump for signs of compromise" Claude: [calls process_dump, detect_anomalies, generate_timeline, etc.] ``` ### Profile Configuration Claude uses the `claude` profile by default (detailed descriptions). No additional configuration needed. --- ## Option 2: Llama (via Ollama) - Local/Offline **Best for:** Confidential investigations, offline/local environments, cost savings ### Prerequisites 1. **Install Ollama:** ```bash curl -fsSL https://ollama.com/install.sh | sh ``` 2. **Pull a Llama model:** ```bash # Recommended: Llama 3.1 70B (best quality) ollama pull llama3.1:70b # Or: Llama 3.2 (smaller, faster) ollama pull llama3.2:latest # Or: Llama 3.1 8B (minimal resources) ollama pull llama3.1:8b ``` 3. **Start Ollama server:** ```bash ollama serve ``` ### Setup Option A: Using Example Client (Recommended) 1. **Install client dependencies:** ```bash cd /home/robb/tools/memory-forensics-mcp/examples pip install -r requirements.txt ``` 2. **Configure LLM profile:** ```bash # For Llama 3.1 70B or larger export MCP_LLM_PROFILE=llama70b # For Llama 13B-70B export MCP_LLM_PROFILE=llama13b # For Llama 8B or smaller export MCP_LLM_PROFILE=minimal ``` 3. **Run the client:** ```bash python ollama_client.py ``` 4. **Specify model (optional):** ```bash OLLAMA_MODEL=llama3.2:latest python ollama_client.py ``` ### Setup Option B: Custom Integration If you want to build your own MCP client: 1. **Install MCP Python SDK:** ```bash pip install mcp ``` 2. **Create your client:** ```python import asyncio from mcp import ClientSession, StdioServerParameters async def main(): server_params = StdioServerParameters( command="/path/to/venv/bin/python", args=["/path/to/server.py"] ) async with ClientSession(server_params) as session: # Initialize await session.initialize() # List tools tools = await session.list_tools() # Call a tool result = await session.call_tool("list_dumps", {}) print(result) asyncio.run(main()) ``` 3. **Integrate with Ollama:** - Send user prompt + tool descriptions to Ollama - Parse tool calls from Ollama response - Execute tools via MCP client - Return results to Ollama for analysis See `examples/ollama_client.py` for a complete implementation. ### Performance Notes **Model Recommendations:** - **Llama 3.1 70B**: Best quality, comparable to Claude for most tasks - **Llama 3.1 13B**: Good balance of speed/quality - **Llama 3.2**: Faster, lighter, good for basic analysis - **Llama 3.1 8B**: Minimal resources, basic capabilities **Hardware Requirements:** - 70B models: 48GB+ RAM or GPU - 13B models: 16GB+ RAM - 8B models: 8GB+ RAM --- ## Option 3: GPT-4 (via OpenAI API) **Best for:** OpenAI ecosystem users ### Setup 1. **Install OpenAI SDK:** ```bash pip install openai mcp ``` 2. **Set API key:** ```bash export OPENAI_API_KEY=sk-... ``` 3. **Configure MCP server:** ```bash export MCP_LLM_PROFILE=gpt4 ``` 4. **Create client:** ```python import openai from mcp import ClientSession # Similar pattern to Ollama client # Replace Ollama API calls with OpenAI API calls ``` **Note:** No official OpenAI MCP client exists yet. You'll need to write a client similar to the Ollama example that: - Connects to MCP server via stdio - Sends prompts to OpenAI API - Parses function calls - Executes via MCP - Returns results --- ## Option 4: Custom/Local LLMs **Examples:** Mistral, Phi, CodeLlama, custom fine-tuned models ### Via Ollama If your model is available in Ollama: ```bash ollama pull mistral:latest OLLAMA_MODEL=mistral:latest MCP_LLM_PROFILE=llama13b python examples/ollama_client.py ``` ### Via Other Runtimes If using llama.cpp, vLLM, or other runtimes: 1. **Start your LLM server** (with API endpoint) 2. **Modify `examples/ollama_client.py`:** - Change `ollama_url` to your server URL - Adjust API call format if needed 3. **Set appropriate profile:** ```bash export MCP_LLM_PROFILE=minimal # for small models export MCP_LLM_PROFILE=llama70b # for large models ``` --- ## Profile Configuration Reference ### Environment Variables ```bash # Set LLM profile export MCP_LLM_PROFILE=llama70b # Override individual settings (optional) export MCP_OUTPUT_FORMAT=json export MCP_VERBOSITY=concise ``` ### Profile Details **Claude Profile** (`claude`): - Markdown output with rich formatting - Detailed 500-character tool descriptions - Includes usage examples - Optimized for Claude's context understanding **Llama 70B Profile** (`llama70b`): - Markdown output - Moderate 300-character descriptions - Includes examples - Good for Llama 3.1 70B and similar large models **Llama 13B Profile** (`llama13b`): - JSON-preferred output for structured data - Concise 150-character descriptions - No examples (to save tokens) - Good for Llama 13B-70B **Minimal Profile** (`minimal`): - JSON output only - Minimal 100-character descriptions - No examples - For small models (7B-13B) or limited context --- ## Comparison: Claude vs Llama | Aspect | Claude (via Claude Code) | Llama (via Ollama) | |--------|--------------------------|-------------------| | **Setup Complexity** | Easy (official client) | Moderate (custom client) | | **Cost** | Paid API | Free (local) | | **Quality** | Highest | Very good (70B) to moderate (8B) | | **Speed** | Fast (cloud) | Varies (local hardware) | | **Privacy** | Cloud-based | Fully local/offline | | **Context Window** | 200K tokens | 128K tokens (Llama 3.1) | | **Tool Calling** | Native support | Manual parsing needed | | **Best For** | Production, highest quality | Confidential, offline environments | --- ## Workflow Examples ### Claude Code Workflow ``` $ claude You: "Analyze Win11Dump memory dump" Claude: Let me analyze that dump for you. [automatically calls list_dumps] [automatically calls process_dump Win11Dump] [automatically calls detect_anomalies Win11Dump] I found several suspicious indicators: 1. cmd.exe spawned by winword.exe (PID 2048) [CRITICAL] This indicates a possible macro exploit... [automatically calls generate_timeline Win11Dump] [automatically calls export_data Win11Dump html] I've created a complete HTML report at: exports/Win11Dump_report_20241211.html ``` ### Ollama Workflow ```bash $ python examples/ollama_client.py You: Analyze Win11Dump memory dump [Calling tool: list_dumps] [Calling tool: process_dump] [Calling tool: detect_anomalies] Assistant: Analysis complete\! Found 3 critical anomalies: - cmd.exe spawned by winword.exe (macro exploit) - svch0st.exe process (typosquatting svchost.exe) - Process running from AppData\Temp (unusual location) Timeline shows: 1. Document opened at 14:22 2. cmd.exe spawned 30s later 3. Network connection to 192.168.1.100:4444 I recommend immediate incident response. ``` --- ## Troubleshooting ### "Ollama not responding" ```bash # Check if Ollama is running curl http://localhost:11434/api/tags # If not, start it ollama serve ``` ### "MCP server connection failed" 1. **Check virtual environment:** ```bash ls /home/robb/tools/memory-forensics-mcp/venv/bin/python ``` 2. **Test server directly:** ```bash cd /home/robb/tools/memory-forensics-mcp ./venv/bin/python server.py # Should wait for input (stdio mode) ``` 3. **Check dependencies:** ```bash ./venv/bin/python -c "import mcp, aiosqlite, volatility3; print('OK')" ``` ### "Tool calls not working with Ollama" - Smaller models (8B, 13B) struggle with complex tool calling - Try Llama 3.1 70B for best results - Use `MCP_LLM_PROFILE=llama70b` for optimized descriptions --- ## Security Considerations ### Offline/Isolated Environments For maximum security: 1. **Use Ollama** (fully local, no network calls) 2. **Disable network** on analysis machine if required 3. **Transfer dumps** via USB/physical media 4. **Export results** to JSON/CSV for offline review ### Data Privacy | LLM Backend | Data Sent To | Recommendation | |-------------|-------------|----------------| | Claude (Cloud) | Anthropic servers | OK for non-confidential | | Ollama (Local) | Localhost only | OK for confidential | | GPT-4 (Cloud) | OpenAI servers | OK for non-confidential | **For confidential/classified investigations:** Use Ollama exclusively. --- ## Performance Tuning ### For Ollama/Llama **GPU Acceleration:** ```bash # Ollama will automatically use CUDA if available ollama run llama3.1:70b ``` **CPU-Only (slower):** ```bash CUDA_VISIBLE_DEVICES="" ollama run llama3.1:8b ``` **Memory limits:** ```bash # Limit to 16GB ollama run llama3.1:13b --memory 16g ``` ### For Claude Code - No tuning needed (cloud-based) - Response times depend on network latency --- ## Summary **Quick Reference:** | Use Case | Recommended Setup | |----------|------------------| | Production forensics | Claude Code (best quality) | | Confidential investigations | Ollama + Llama 3.1 70B | | Offline/local analysis | Ollama + Llama 3.1 70B | | Resource-constrained | Ollama + Llama 3.2 | | Cost-sensitive | Ollama + any Llama model | **Getting Started:** 1. For Claude: Use this guide to set up Claude Code (easiest) 2. For Llama: Install Ollama, pull a model, run `python examples/ollama_client.py` 3. For other LLMs: Adapt the Ollama client example The MCP server is the same for all - only the client changes\!

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/robbgatica/memory-lane'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

MULTI_LLM_GUIDE.md•11.3 KiB