long-context-mcp

Overview Schema Related Servers Score Discussions

long-context-mcp
docs

benchmarks.md•12.1 KiB

# RLM Benchmarks ## One-shot vs. Recursive Probing Standard MCP tool calls often involve stuffing as much context as possible into a single prompt. This leads to: 1. Higher costs (quadratic attention). 2. "Lost in the middle" phenomena. 3. Context window limits being hit. RLM uses recursive probing to find *exactly* what is needed. ### Benchmark Harness You can run the included benchmark script to see the difference for your own codebase: ```bash uv run python bench/bench_tokens.py \ --query "Detailed explanation of security boundaries" \ --globs "**/*.py" \ --provider_preset openrouter \ --model anthropic/claude-3-sonnet ``` ### Typical Results - **Vanilla/One-shot**: 120k input tokens, $0.36, 15s latency. - **RLM**: 12k input tokens, $0.04, 30s latency (recursive steps). **Win**: ~90% reduction in token pressure and cost for large-context tasks. **Trade-off**: Slightly higher latency due to sequential LLM steps. ## Quick Tests ### Test 1 `qwen/qwen-2.5-coder-32b-instruct` Example of a quick test for less than $0.01 USD: ```bash uv run python bench/bench_tokens.py \ --query "What are the core components of the RLM MCP server? Return a JSON list." \ --globs "rlm_mcp_server/*.py" \ --provider_preset openrouter \ --model qwen/qwen-2.5-coder-32b-instruct \ --dump-dir ./examples/benchmark_tests/qwen-2.5-coder-32b-instruct ``` More tokens consumed by RLM for this benchmark test and 10x the cost, but also a much more comprehensive and useful [answer](examples/benchmark_test/rlm_answer.txt) vs the baseline [answer](examples/benchmark_test/baseline_answer.txt). ```bash Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ Metric ┃ Baseline (One-Shot) ┃ RLM (Recursive) ┃ Delta ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ Answer Score │ 1.0 │ 1.0 │ +0.0 │ │ Cost ($) │ $0.0004 │ $0.0042 │ $+0.0038 │ │ Time (sec) │ 1.93 │ 132.64 │ 130.71 │ │ Peak Prompt Tokens │ 5801 │ 0 (No sub-calls) │ N/A │ │ Total Input Tokens │ 5801 │ 44452 │ +38651 │ │ Total Output Tokens │ 30 │ 6883 │ +6853 │ │ Total Tokens │ 5831 │ 51335 │ +45504 │ └─────────────────────┴─────────────────────┴──────────────────┴──────────┘ Context Analysis Raw Ingested Context Size: ~24151 bytes Approx. Context Tokens: ~6037 RLM Recursive Steps: 9 ``` ### Test 2 `google/gemini-2.0-flash-001` Example of a quick test for less than $0.01 USD: ```bash uv run python bench/bench_tokens.py \ --query "What are the core components of the RLM MCP server? Return a JSON list." \ --globs "rlm_mcp_server/*.py" \ --provider_preset openrouter \ --model google/gemini-2.0-flash-001 \ --dump-dir ./examples/benchmark_tests/gemini-2.0-flash-001 ``` ```bash Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ Metric ┃ Baseline (One-Shot) ┃ RLM (Recursive) ┃ Delta ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ Answer Score │ 1.0 │ 1.0 │ +0.0 │ │ Cost ($) │ $0.0007 │ $0.0055 │ $+0.0048 │ │ Time (sec) │ 1.18 │ 45.07 │ 43.89 │ │ Peak Prompt Tokens │ 6710 │ 0 (No sub-calls) │ N/A │ │ Total Input Tokens │ 6710 │ 41722 │ +35012 │ │ Total Output Tokens │ 45 │ 3195 │ +3150 │ │ Total Tokens │ 6755 │ 44917 │ +38162 │ └─────────────────────┴─────────────────────┴──────────────────┴──────────┘ Context Analysis Raw Ingested Context Size: ~24151 bytes Approx. Context Tokens: ~6037 RLM Recursive Steps: 12 ``` ### Test 3 `google/gemini-2.5-flash-lite` ```bash uv run python bench/bench_tokens.py \ --query "What are the core components of the RLM MCP server? Return a JSON list." \ --globs "rlm_mcp_server/*.py" \ --provider_preset openrouter \ --model google/gemini-2.5-flash-lite \ --dump-dir ./examples/benchmark_tests/gemini-2.5-flash-lite ``` ```bash Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ Metric ┃ Baseline (One-Shot) ┃ RLM (Recursive) ┃ Delta ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ Answer Score │ 1.0 │ 1.0 │ +0.0 │ │ Cost ($) │ $0.0000 │ $0.0000 │ $+0.0000 │ │ Time (sec) │ 0.95 │ 18.13 │ 17.18 │ │ Peak Prompt Tokens │ 6712 │ 0 (No sub-calls) │ N/A │ │ Total Input Tokens │ 6712 │ 9871 │ +3159 │ │ Total Output Tokens │ 83 │ 917 │ +834 │ │ Total Tokens │ 6795 │ 10788 │ +3993 │ └─────────────────────┴─────────────────────┴──────────────────┴──────────┘ Context Analysis Raw Ingested Context Size: ~24151 bytes Approx. Context Tokens: ~6037 RLM Recursive Steps: 3 ``` ### Test 4 `openai/gpt-oss-120b` ```bash uv run python bench/bench_tokens.py \ --query "What are the core components of the RLM MCP server? Return a JSON list." \ --globs "rlm_mcp_server/*.py" \ --provider_preset openrouter \ --model openai/gpt-oss-120b \ --dump-dir ./examples/benchmark_tests/gpt-oss-120b ``` ```bash Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ Metric ┃ Baseline (One-Shot) ┃ RLM (Recursive) ┃ Delta ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ Answer Score │ 1.0 │ 1.0 │ +0.0 │ │ Cost ($) │ $0.0000 │ $0.0000 │ $+0.0000 │ │ Time (sec) │ 1.33 │ 40.06 │ 38.73 │ │ Peak Prompt Tokens │ 5857 │ 0 (No sub-calls) │ N/A │ │ Total Input Tokens │ 5857 │ 33801 │ +27944 │ │ Total Output Tokens │ 279 │ 2456 │ +2177 │ │ Total Tokens │ 6136 │ 36257 │ +30121 │ └─────────────────────┴─────────────────────┴──────────────────┴──────────┘ Context Analysis Raw Ingested Context Size: ~24151 bytes Approx. Context Tokens: ~6037 RLM Recursive Steps: 7 ``` ### Test 5 `openai/gpt-oss-20b` ```bash uv run python bench/bench_tokens.py \ --query "What are the core components of the RLM MCP server? Return a JSON list." \ --globs "rlm_mcp_server/*.py" \ --provider_preset openrouter \ --model openai/gpt-oss-20b \ --dump-dir ./examples/benchmark_tests/gpt-oss-20b ``` ```bash Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ Metric ┃ Baseline (One-Shot) ┃ RLM (Recursive) ┃ Delta ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ Answer Score │ 1.0 │ 0.0 │ -1.0 │ │ Cost ($) │ $0.0000 │ $0.0000 │ $+0.0000 │ │ Time (sec) │ 2.99 │ 320.76 │ 317.77 │ │ Peak Prompt Tokens │ 5857 │ 0 (No sub-calls) │ N/A │ │ Total Input Tokens │ 5857 │ 135520 │ +129663 │ │ Total Output Tokens │ 200 │ 19651 │ +19451 │ │ Total Tokens │ 6057 │ 155171 │ +149114 │ └─────────────────────┴─────────────────────┴──────────────────┴──────────┘ Context Analysis Raw Ingested Context Size: ~24151 bytes Approx. Context Tokens: ~6037 RLM Recursive Steps: 29 ``` ### Test 6 `openai/gpt-5-nano` ```bash uv run python bench/bench_tokens.py \ --query "What are the core components of the RLM MCP server? Return a JSON list." \ --globs "rlm_mcp_server/*.py" \ --provider_preset openrouter \ --model openai/gpt-5-nano \ --dump-dir ./examples/benchmark_tests/gpt-5-nano ``` ```bash Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ Metric ┃ Baseline (One-Shot) ┃ RLM (Recursive) ┃ Delta ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ │ Answer Score │ 1.0 │ 1.0 │ +0.0 │ │ Cost ($) │ $0.0000 │ $0.0000 │ $+0.0000 │ │ Time (sec) │ 9.04 │ 93.85 │ 84.81 │ │ Peak Prompt Tokens │ 5793 │ 0 (No sub-calls) │ N/A │ │ Total Input Tokens │ 5793 │ 11824 │ +6031 │ │ Total Output Tokens │ 823 │ 9532 │ +8709 │ │ Total Tokens │ 6616 │ 21356 │ +14740 │ └─────────────────────┴─────────────────────┴──────────────────┴──────────┘ Context Analysis Raw Ingested Context Size: ~24151 bytes Approx. Context Tokens: ~6037 RLM Recursive Steps: 7 ```

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wx-b/long-context-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

benchmarks.md•12.1 KiB