We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/wx-b/long-context-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
INTEGRATION_CHECKLIST.md•3.6 KiB
# Integration Checklist
This checklist covers the minimal steps to integrate RLM model selection and cost optimization into your workflow.
## Quick Setup
### 1. Choose Your Provider
- [ ] **OpenRouter** (recommended for cloud usage)
- Get API key from [OpenRouter](https://openrouter.ai/keys)
- Set `OPENROUTER_API_KEY` environment variable
- Base URL: `https://openrouter.ai/api/v1`
- [ ] **Local Ollama**
- Install [Ollama](https://ollama.com/)
- Pull models: `ollama pull qwen2.5-coder:7b` and `ollama pull qwen2.5-coder:3b`
- Start server: `ollama serve`
- Base URL: `http://localhost:11434/v1` (OpenAI-compatible)
- [ ] **Local vLLM**
- Install [vLLM](https://docs.vllm.ai/)
- Start server: `vllm serve qwen2.5-coder-7b-instruct --api-key your-key`
- Base URL: `http://localhost:8000/v1` (default)
- [ ] **LiteLLM Proxy**
- Install and configure [LiteLLM](https://docs.litellm.ai/)
- Start proxy server
- Configure routing to your preferred providers
### 2. Configure Environment
- [ ] Copy appropriate `.env.example` from `configs/env/`
- [ ] Set required environment variables
- [ ] Test connectivity to your provider
### 3. Test Basic Functionality
- [ ] Run `bench/bench_tokens.py` to verify RLM token savings
- [ ] Test with a simple preset from `configs/presets/`
- [ ] Verify MCP server responds to `rlm.solve` requests
## Model Selection Strategy
### Cost Optimization
- [ ] Understand "strong root + cheap recursion" pattern
- [ ] Choose appropriate models based on your budget vs quality needs
- [ ] Set reasonable `max_iterations` (8-15 recommended)
- [ ] Limit `other_backend_kwargs.max_tokens` (256-512 for recursion)
### Performance Tuning
- [ ] Benchmark different model combinations
- [ ] Adjust `temperature` settings (lower for recursion)
- [ ] Set appropriate `timeout_sec` for your use case
- [ ] Monitor actual vs expected token usage
## Security and Best Practices
### API Key Management
- [ ] Never commit real API keys to version control
- [ ] Use `.mcp.json.example` as template
- [ ] Ensure `.mcp.json` is in `.gitignore`
- [ ] Rotate keys regularly if using cloud providers
### Cost Monitoring
- [ ] Set up usage alerts with your provider
- [ ] Track token consumption patterns
- [ ] Implement rate limiting if needed
- [ ] Have fallback to local models for cost control
## Advanced Configuration
### Custom Presets
- [ ] Create custom JSON presets for specific use cases
- [ ] Test presets with `bench/bench_tokens.py`
- [ ] Document your custom configurations
- [ ] Share presets with your team
### Provider-Specific Tuning
- [ ] Understand OpenRouter's native token counting
- [ ] Use `/api/v1/generation` for precise cost accounting
- [ ] Configure OpenRouter headers if needed (`HTTP-Referer`, `X-Title`)
- [ ] Test local provider compatibility thoroughly
## Troubleshooting
### Common Issues
- [ ] **Connection errors**: Verify base URLs and API keys
- [ ] **Model not found**: Check model availability with your provider
- [ ] **Rate limits**: Implement exponential backoff
- [ ] **Cost surprises**: Monitor usage and adjust iteration limits
### Debugging Tools
- [ ] Use `scripts/openrouter_model_picker.py` to find available models
- [ ] Check provider documentation for model IDs and pricing
- [ ] Test with minimal payloads first
- [ ] Verify JSON schema compliance
## References
* [OpenRouter Documentation](https://openrouter.ai/docs)
* [Ollama OpenAI Compatibility](https://docs.ollama.com/api/openai-compatibility)
* [vLLM OpenAI-Compatible Server](https://docs.vllm.ai/en/v0.8.1/serving/openai_compatible_server.html)
* [LiteLLM Proxy](https://docs.litellm.ai/docs/providers/litellm_proxy)