# Model Presets
This document explains how to use the preset JSON payloads for `rlm.solve` requests. These presets provide ready-to-use configurations for different cost/performance trade-offs.
## Using Presets
Presets are located in `configs/presets/` and contain complete `rlm.solve` request payloads. You can:
1. Load a preset JSON file
2. Modify specific fields as needed (e.g., change prompts, adjust iteration limits)
3. Send the request to your MCP server
## Available Presets
### openrouter_ultra_cheap.json
* **Root model**: `qwen/qwen-2.5-coder-32b-instruct`
* **Other model**: `qwen/qwen-2.5-coder-7b-instruct`
* **Use case**: Aggressive cost minimization, acceptable occasional misses
* **Cost**: Very low (sub-$0.10/M tokens)
### openrouter_balanced.json
* **Root model**: `google/gemini-2.0-flash-001`
* **Other model**: `qwen/qwen-2.5-coder-7b-instruct`
* **Use case**: Good default for agentic workflows - quick iterations, huge context
* **Cost**: Moderate (sub-$0.40/M tokens)
### openrouter_conservative.json
* **Root model**: `openai/gpt-4o-mini`
* **Other model**: `qwen/qwen-2.5-coder-7b-instruct`
* **Use case**: Fewer surprises at low cost
* **Cost**: Moderate (sub-$0.70/M tokens)
### ollama_local.json
* **Root model**: `qwen2.5-coder:7b`
* **Other model**: `qwen2.5-coder:3b`
* **Use case**: Local inference, no API costs
* **Requirements**: Ollama running with compatible models
### vllm_local.json
* **Root model**: `qwen2.5-coder-7b-instruct`
* **Other model**: `qwen2.5-coder-3b-instruct`
* **Use case**: Local inference via vLLM server, no API costs
* **Requirements**: vLLM server running
### litellm_proxy.json
* **Root model**: Configurable via LiteLLM proxy
* **Other model**: Configurable via LiteLLM proxy
* **Use case**: Route through LiteLLM proxy for unified API management
* **Requirements**: LiteLLM proxy server running
## Request Payload Structure
All presets follow the nested request format:
```json
{
"v": 1,
"id": "your-request-id",
"request": {
"provider": {
"provider_preset": "provider_name"
},
"rlm": {
"backend": "openai_compatible",
"environment": "docker",
"model_name": "root-model-id",
"other_model_name": "recursion-model-id",
"max_iterations": 12,
"timeout_sec": 90,
"backend_kwargs": {
"temperature": 0.2,
"max_tokens": 1200
},
"other_backend_kwargs": {
"temperature": 0.0,
"max_tokens": 384
}
},
"inputs": {
"prompt": "your task context",
"root_prompt": "your root instruction"
}
}
}
```
## Environment Variables
Make sure to set the appropriate environment variables for your chosen provider:
* **OpenRouter**: `OPENROUTER_API_KEY`
* **vLLM**: `VLLM_API_KEY` (optional)
* **Ollama**: Usually no key required
* **LiteLLM Proxy**: Configure via proxy settings
See `configs/env/` for example environment files.
## Customizing Presets
You can modify presets by:
1. Loading the JSON
2. Updating model names, iteration limits, or backend parameters
3. Adjusting prompts in the `inputs` section
4. Adding provider-specific configuration
## Cost Considerations
* Monitor your usage via OpenRouter's dashboard
* Use `bench/bench_tokens.py` to measure token consumption
* Consider the "strong root, cheap recursion" pattern for cost optimization
* Local options (Ollama/vLLM) eliminate API costs but require GPU resources