claude-recall

Overview Schema Related Servers Score Discussions

openrouter-provider.mdx•10.7 KiB

--- title: "OpenRouter Provider" description: "Access 100+ AI models through OpenRouter's unified API, including free models for cost-effective observation extraction" --- # OpenRouter Provider claude-recall supports [OpenRouter](https://openrouter.ai) as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from different providers including Google, Meta, Mistral, DeepSeek, and many others—often with generous free tiers. <Tip> **Free Models Available**: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality. </Tip> ## Why Use OpenRouter? - **Access to 100+ models**: Choose from models across multiple providers through one API - **Free tier options**: Several high-quality models are completely free to use - **Cost flexibility**: Pay-as-you-go pricing on premium models with no commitments - **Seamless fallback**: Automatically falls back to Claude if OpenRouter is unavailable - **Hot-swappable**: Switch providers without restarting the worker - **Multi-turn conversations**: Full conversation history maintained across API calls ## Free Models on OpenRouter OpenRouter actively supports democratizing AI access by offering free models. These are production-ready models suitable for observation extraction. ### Featured Free Models | Model | ID | Parameters | Context | Best For | |-------|------|------------|---------|----------| | **Xiaomi MiMo-V2-Flash** | `xiaomi/mimo-v2-flash:free` | 309B (15B active, MoE) | 256K | Reasoning, coding, agents | | **Gemini 2.0 Flash** | `google/gemini-2.0-flash-exp:free` | — | 1M | General purpose | | **Gemini 2.5 Flash** | `google/gemini-2.5-flash-preview:free` | — | 1M | Latest capabilities | | **DeepSeek R1** | `deepseek/deepseek-r1:free` | 671B | 64K | Reasoning, analysis | | **Llama 3.1 70B** | `meta-llama/llama-3.1-70b-instruct:free` | 70B | 128K | General purpose | | **Llama 3.1 8B** | `meta-llama/llama-3.1-8b-instruct:free` | 8B | 128K | Fast, lightweight | | **Mistral Nemo** | `mistralai/mistral-nemo:free` | 12B | 128K | Efficient performance | <Note> **Default Model**: claude-recall uses `xiaomi/mimo-v2-flash:free` by default—a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks. </Note> ### Free Model Considerations - **Rate limits**: Free models may have stricter rate limits than paid models - **Availability**: Free capacity depends on provider partnerships and demand - **Queue times**: During peak usage, requests may be queued briefly - **Max tokens**: Most free models support 65,536 completion tokens All free models support: - Tool use and function calling - Temperature and sampling controls - Stop sequences - Streaming responses ## Getting an API Key 1. Go to [OpenRouter](https://openrouter.ai) 2. Sign in with Google, GitHub, or email 3. Navigate to [API Keys](https://openrouter.ai/keys) 4. Click **Create Key** 5. Copy and securely store your API key <Tip> **Free to start**: No credit card required to create an account or use free models. Add credits only if you want to use premium models. </Tip> ## Configuration ### Settings | Setting | Values | Default | Description | |---------|--------|---------|-------------| | `CLAUDE_RECALL_PROVIDER` | `claude`, `gemini`, `openrouter` | `claude` | AI provider for observation extraction | | `CLAUDE_RECALL_OPENROUTER_API_KEY` | string | — | Your OpenRouter API key | | `CLAUDE_RECALL_OPENROUTER_MODEL` | string | `xiaomi/mimo-v2-flash:free` | Model identifier (see list above) | | `CLAUDE_RECALL_OPENROUTER_MAX_CONTEXT_MESSAGES` | number | `20` | Max messages in conversation history | | `CLAUDE_RECALL_OPENROUTER_MAX_TOKENS` | number | `100000` | Token budget safety limit | | `CLAUDE_RECALL_OPENROUTER_SITE_URL` | string | — | Optional: URL for analytics attribution | | `CLAUDE_RECALL_OPENROUTER_APP_NAME` | string | `claude-recall` | Optional: App name for analytics | ### Using the Settings UI 1. Open the viewer at http://localhost:37777 2. Click the **gear icon** to open Settings 3. Under **AI Provider**, select **OpenRouter** 4. Enter your OpenRouter API key 5. Optionally select a different model Settings are applied immediately—no restart required. ### Manual Configuration Edit `~/.claude-recall/settings.json`: ```json { "CLAUDE_RECALL_PROVIDER": "openrouter", "CLAUDE_RECALL_OPENROUTER_API_KEY": "sk-or-v1-your-key-here", "CLAUDE_RECALL_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free" } ``` Alternatively, set the API key via environment variable: ```bash export OPENROUTER_API_KEY="sk-or-v1-your-key-here" ``` The settings file takes precedence over the environment variable. ## Model Selection Guide ### For Free Usage (No Cost) **Recommended**: `xiaomi/mimo-v2-flash:free` - Best-in-class performance on coding benchmarks - 256K context window handles large observations - 65K max completion tokens - Mixture-of-experts architecture (15B active parameters) **Alternatives**: - `google/gemini-2.0-flash-exp:free` - 1M context, Google's flagship - `deepseek/deepseek-r1:free` - Excellent reasoning capabilities - `meta-llama/llama-3.1-70b-instruct:free` - Strong general purpose ### For Paid Usage (Higher Quality/Speed) | Model | Price (per 1M tokens) | Best For | |-------|----------------------|----------| | `anthropic/claude-3.5-sonnet` | $3 in / $15 out | Highest quality observations | | `google/gemini-2.0-flash` | $0.075 in / $0.30 out | Fast, cost-effective | | `openai/gpt-4o` | $2.50 in / $10 out | GPT-4 quality | ## Context Window Management OpenRouter agent implements intelligent context management to prevent runaway costs: ### Automatic Truncation The agent uses a sliding window strategy: 1. Checks if message count exceeds `MAX_CONTEXT_MESSAGES` (default: 20) 2. Checks if estimated tokens exceed `MAX_TOKENS` (default: 100,000) 3. If limits exceeded, keeps most recent messages only 4. Logs warnings with dropped message counts ### Token Estimation - Conservative estimate: 1 token ≈ 4 characters - Used for proactive context management - Actual usage logged from API response ### Cost Tracking Logs include detailed usage information: ``` OpenRouter API usage: { model: "xiaomi/mimo-v2-flash:free", inputTokens: 2500, outputTokens: 1200, totalTokens: 3700, estimatedCostUSD: "0.00", messagesInContext: 8 } ``` ## Provider Switching You can switch between providers at any time: - **No restart required**: Changes take effect on the next observation - **Conversation history preserved**: When switching mid-session, the new provider sees the full conversation context - **Seamless transition**: All providers use the same observation format ### Switching via UI 1. Open Settings in the viewer 2. Change the **AI Provider** dropdown 3. The next observation will use the new provider ### Switching via Settings File ```json { "CLAUDE_RECALL_PROVIDER": "openrouter" } ``` ## Fallback Behavior If OpenRouter encounters errors, claude-recall automatically falls back to the Claude Agent SDK: **Triggers fallback:** - Rate limiting (HTTP 429) - Server errors (HTTP 500, 502, 503) - Network issues (connection refused, timeout) - Generic fetch failures **Does not trigger fallback:** - Missing API key (logs warning, uses Claude from start) - Invalid API key (fails with error) When fallback occurs: 1. A warning is logged 2. Any in-progress messages are reset to pending 3. Claude SDK takes over with the full conversation context <Note> **Fallback is transparent**: Your observations continue processing without interruption. The fallback preserves all conversation context. </Note> ## Multi-Turn Conversation Support OpenRouter agent maintains full conversation history across API calls: ``` Session Created ↓ Load Pending Messages (observations from queue) ↓ For each message: → Add to conversation history → Call OpenRouter API with FULL history → Parse XML response → Store observations in database → Sync to Chroma vector DB ↓ Session complete ``` This enables: - Coherent multi-turn exchanges - Context preservation across observations - Seamless provider switching mid-session ## Troubleshooting ### "OpenRouter API key not configured" Either: - Set `CLAUDE_RECALL_OPENROUTER_API_KEY` in `~/.claude-recall/settings.json`, or - Set the `OPENROUTER_API_KEY` environment variable ### Rate Limiting Free models may have rate limits during peak usage. If you hit rate limits: - claude-recall automatically falls back to Claude SDK - Consider switching to a different free model - Add credits for premium model access ### Model Not Found Verify the model ID is correct: - Check [OpenRouter Models](https://openrouter.ai/models) for current availability - Use the `:free` suffix for free model variants - Model IDs are case-sensitive ### High Token Usage Warning If you see warnings about high token usage (>50,000 per request): - Reduce `CLAUDE_RECALL_OPENROUTER_MAX_CONTEXT_MESSAGES` - Reduce `CLAUDE_RECALL_OPENROUTER_MAX_TOKENS` - Consider a model with larger context window ### Connection Errors If you see connection errors: - Check your internet connection - Verify OpenRouter service status at [status.openrouter.ai](https://status.openrouter.ai) - The agent will automatically fall back to Claude ## API Details OpenRouter uses an OpenAI-compatible REST API: **Endpoint**: `https://openrouter.ai/api/v1/chat/completions` **Headers**: ``` Authorization: Bearer {apiKey} HTTP-Referer: https://github.com/nhevers/claude-recall X-Title: claude-recall Content-Type: application/json ``` **Request Format**: ```json { "model": "xiaomi/mimo-v2-flash:free", "messages": [ {"role": "system", "content": "..."}, {"role": "user", "content": "..."} ], "temperature": 0.3, "max_tokens": 4096 } ``` ## Comparing Providers | Feature | Claude (SDK) | Gemini | OpenRouter | |---------|-------------|--------|------------| | **Cost** | Pay per token | Free tier + paid | Free models + paid | | **Models** | Claude only | Gemini only | 100+ models | | **Quality** | Highest | High | Varies by model | | **Rate limits** | Based on tier | 5-4000 RPM | Varies by model | | **Fallback** | N/A (primary) | → Claude | → Claude | | **Setup** | Automatic | API key required | API key required | <Tip> **Recommendation**: Start with OpenRouter's free `xiaomi/mimo-v2-flash:free` model for zero-cost observation extraction. If you need higher quality or encounter rate limits, switch to Claude or add OpenRouter credits for premium models. </Tip> ## Next Steps - [Configuration](/configuration) - Full settings reference - [Gemini Provider](/usage/gemini-provider) - Alternative free provider - [Getting Started](/usage/getting-started) - Basic usage guide - [Troubleshooting](/troubleshooting) - Common issues

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/nhevers/claude-recall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

openrouter-provider.mdx•10.7 KiB