Schema | ypollak2/llm-router

ypollak2/llm-router

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
No arguments

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": false }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
llm_classify	Classify a prompt's complexity and recommend which model to use. Returns a smart recommendation considering complexity, daily token budget, quality preference, and minimum model floor. Includes budget usage bar. Complexity drives model selection at all times: simple → haiku, moderate → sonnet, complex → opus Budget pressure is a late safety net only: 0-85%: no downshift — complexity routing handles efficiency 85-95%: downshift by 1 tier (opus→sonnet, sonnet→haiku) 95%+: downshift by 2 tiers, warns user Args: prompt: The task or question to classify. quality: Override quality mode — "best", "balanced", or "conserve". min_model: Override minimum model floor — "haiku", "sonnet", or "opus".
llm_track_usage	Report Claude Code model token usage for budget tracking. Call this after using an Agent with haiku/sonnet to track token consumption against the daily budget. This enables progressive model downshifting. Shows per-call savings vs opus and cumulative session savings. Args: model: The Claude model used — "haiku", "sonnet", or "opus". tokens_used: Approximate tokens consumed by the Agent call. complexity: The task complexity that was routed — "simple", "moderate", "complex".
llm_route	Smart router — classifies task complexity, then routes to the optimal external LLM. Uses a cheap classifier to assess complexity, then picks the right model tier: simple → budget models (Gemini Flash, GPT-4o-mini) moderate → balanced models (GPT-4o, Sonnet, Gemini Pro) complex → premium models (o3, Opus) For routing to Claude Code's own models (haiku/sonnet) without API keys, use llm_classify instead and follow its recommendation. Args: prompt: The task or question to route. task_type: Optional hint — "query", "research", "generate", "analyze", "code". Auto-detected if omitted. complexity_override: Skip classification — force "simple", "moderate", or "complex". system_prompt: Optional system instructions. temperature: Sampling temperature (0.0-2.0). max_tokens: Maximum output tokens. context: Optional conversation context to help the model understand the broader task.
llm_auto	Auto-routing wrapper with persistent savings tracking — works from any host. Equivalent to llm_route but additionally: Flushes pending hook-written savings records into SQLite before routing. Appends a compact savings envelope every 5 calls so you can see the cumulative value across all sessions and hosts without running llm_savings. Use llm_auto instead of llm_route when you are in a host that lacks a UserPromptSubmit hook (Codex CLI, Claude Desktop, GitHub Copilot) — the savings are tracked server-side, so they accumulate correctly regardless of which client triggered the call. Args: prompt: The task or question to route. task_type: Optional hint — "query", "research", "generate", "analyze", "code". profile_override: Force a routing profile — "budget", "balanced", or "premium". system_prompt: Optional system instructions. context: Optional conversation context.
llm_stream	Stream an LLM response for long-running tasks — shows output as it arrives. Uses the same routing logic as llm_route but streams chunks instead of waiting for the full response. Ideal for long-form generation, research summaries, or any task where seeing partial output early is valuable. Args: prompt: The task or question to stream. task_type: Task type hint — "query", "research", "generate", "analyze", "code". model: Optional model override (e.g. "openai/gpt-4o", "gemini/gemini-2.5-flash"). system_prompt: Optional system instructions. temperature: Sampling temperature (0.0-2.0). max_tokens: Maximum output tokens.
llm_select_agent	Classify a task prompt and return the recommended agent CLI + model for session-level routing. Use this BEFORE starting a Claude Code / Codex / Gemini CLI session to pick the right agent runtime for the task. This is session-level routing — it selects which agent to invoke, not which model to call mid-session. Decision tree (profile × complexity): budget + simple/moderate → codex + gpt-4o-mini budget + complex → codex + gpt-4o (Codex handles most coding; escalate if needed) balanced + simple → codex + gpt-4o-mini balanced + moderate → claude_code + sonnet balanced + complex → claude_code + opus premium + any → claude_code + opus Returns JSON with: primary — agent binary name: "claude_code" \| "codex" \| "gemini_cli" primary_model — model flag value (pass via -m or --model) fallback — fallback agent if primary unavailable fallback_model — model for fallback task_type — classified task type (code / analyze / generate / research / query) complexity — simple \| moderate \| complex confidence — classifier confidence 0–1 reason — one-line classification rationale env_check — dict of required env vars and whether they're set Args: prompt: The task description to classify (same text you'd pass to the agent). profile: Routing profile — "budget", "balanced", or "premium" (default: "balanced").
llm_reroute	Override the last routing decision and record it for feedback learning. Logs the correction to the database so future routing decisions for this task type have lowered confidence. Use this when llm_route, llm_query, llm_code, or any other tool chose the wrong model for your task. Args: to_tool: Which tool to use instead (e.g. "llm_analyze", "llm_code"). reason: Optional explanation — stored for routing quality improvement. original_tool: The tool that made the wrong decision (auto-detected if omitted). original_model: The model that was selected (for logging purposes).
llm_query	Send a general query to the best available LLM. Routes by complexity: simple→Haiku/Flash, moderate→Sonnet/GPT-4o, complex→Opus/o3. Args: prompt: The question or prompt to send. complexity: Task complexity — "simple", "moderate", or "complex". Drives model selection: simple→cheap (Haiku/Flash), moderate→balanced (Sonnet/GPT-4o), complex→premium (Opus/o3). Auto-detected from prompt length when omitted. model: Explicit model override, bypasses complexity routing entirely. system_prompt: Optional system instructions. temperature: Sampling temperature (0.0-2.0). max_tokens: Maximum output tokens. context: Optional conversation context to help the model understand the broader task.
llm_research	Search-augmented research query — routes to Perplexity for web-grounded answers. Best for: fact-checking, current events, finding sources, market research. Args: prompt: The research question. system_prompt: Optional system instructions. max_tokens: Maximum output tokens. context: Optional conversation context to help the model understand the broader task.
llm_generate	Generate creative or long-form content — routes to the best generation model. Best for: writing, summarization, brainstorming, content creation. Args: prompt: What to generate. complexity: Task complexity — "simple", "moderate", or "complex". Drives model selection. Simple tasks (short summaries) use cheap models; complex tasks (long-form, nuanced writing) use premium models. system_prompt: Optional system instructions (tone, format, audience). temperature: Sampling temperature (higher = more creative). max_tokens: Maximum output tokens. context: Optional conversation context to help the model understand the broader task.
llm_analyze	Deep analysis task — routes to the strongest reasoning model. Best for: data analysis, code review, problem decomposition, debugging. Args: prompt: What to analyze. complexity: Task complexity — "simple", "moderate", or "complex". Analysis tasks default to at least moderate. Pass "complex" for multi-file reviews or architecture decisions that warrant Opus/o3. system_prompt: Optional system instructions. max_tokens: Maximum output tokens. context: Optional conversation context to help the model understand the broader task.
llm_code	Coding task — routes to the best coding model. Best for: code generation, refactoring suggestions, algorithm design. Args: prompt: The coding task or question. complexity: Task complexity — "simple", "moderate", or "complex". Drives model selection: simple questions use Haiku/Flash, actual implementation tasks use Sonnet/GPT-4o, large refactors or architecture work use Opus/o3. system_prompt: Optional system instructions (language, framework, style). max_tokens: Maximum output tokens. context: Optional conversation context to help the model understand the broader task.
llm_edit	Route code-edit reasoning to a cheap model and return exact edit instructions. Instead of Opus reasoning about what to change (expensive), a cheap model reads the files, figures out the edits, and returns JSON `{file, old_string, new_string}` pairs that Claude can apply mechanically via the Edit tool. How to use the result: After calling this tool, apply each edit instruction using the Edit tool with the exact old_string → new_string pairs provided. Best for: refactoring, bug fixes, adding small features to existing files. Args: task: Natural-language description of what to change (e.g. "Add type hints to all public functions in router.py"). files: List of file paths to read and include in the prompt. Relative paths are resolved from the current working directory. Files larger than 32 KB are truncated with a note. context: Optional conversation context to help the model understand the task.
llm_image	Generate an image — auto-routes to Gemini Imagen, DALL-E, Flux, or Stable Diffusion. Args: prompt: Description of the image to generate. model: Optional model override (e.g. "gemini/imagen-3", "openai/dall-e-3", "fal/flux-pro", "stability/stable-diffusion-3"). size: Image size (e.g. "1024x1024", "1792x1024"). quality: Image quality — "standard" or "hd" (DALL-E only).
llm_video	Generate a video — routes to Gemini Veo, Runway, Kling, or other video models. Args: prompt: Description of the video to generate. model: Optional model override (e.g. "gemini/veo-2", "runway/gen3a_turbo", "fal/kling-video"). duration: Video duration in seconds (default: 5).
llm_audio	Generate speech/audio — routes to ElevenLabs or OpenAI TTS. Args: text: Text to convert to speech. model: Optional model override (e.g. "openai/tts-1-hd", "elevenlabs/eleven_multilingual_v2"). voice: Voice selection (OpenAI: alloy/echo/fable/onyx/nova/shimmer. ElevenLabs: voice ID).
llm_orchestrate	Multi-step orchestration — automatically decomposes complex tasks across multiple LLMs. Chains research, analysis, generation, and coding steps together, routing each to the optimal model. Use templates for common patterns or let the AI decompose. Free tier: up to 2-step pipelines. Pro tier: unlimited steps + auto-decomposition. Args: task: Description of the complex task to accomplish. template: Optional pipeline template: "research_report", "competitive_analysis", "content_pipeline", "code_review_fix". Omit for auto-decomposition.
llm_pipeline_templates	List available pipeline templates for multi-step orchestration.
llm_save_session	Summarize and save the current session for cross-session context. Uses a cheap model to generate a compact summary of the session's exchanges, then persists it to SQLite. Future routed calls will include this summary as context, giving external models awareness of prior work. Call this before ending a session or when switching to a different task. Sessions with fewer than 3 exchanges are skipped.
llm_set_profile	Switch the active routing profile. Args: profile: One of "budget", "balanced", or "premium".
llm_usage	Unified usage dashboard — Claude subscription, Codex, external APIs, and savings. Shows a complete picture of all LLM usage across all providers in one view. Args: period: Time period — "today", "week", "month", or "all".
llm_savings	Show time-bucketed savings dashboard: today / this week / this month / all-time. Displays actual spend vs Sonnet baseline and the efficiency multiplier (Nx) for each period. Use this to understand the real dollar value routing provides. Returns: Formatted savings table with efficiency multiplier.
llm_cache_stats	Show prompt classification cache statistics — hit rate, entries, memory usage. The cache stores ClassificationResult objects keyed by SHA-256(prompt + quality_mode + min_model). Budget pressure is always applied fresh, so cached classifications stay valid.
llm_cache_clear	Clear the prompt classification cache.
llm_quality_report	Show routing quality metrics — classification accuracy, savings, model distribution. Analyzes routing decisions over the specified period to show how the classifier is performing, which models are being selected, downshift rates, and cost efficiency. Args: days: Number of days to include in the report (default 7).
llm_quality_guard	Show quality scores per model with degradation alerts (v6.2). Displays rolling average judge scores for all routed models over the past N days. Alerts if any model's score < 0.7 with sufficient samples (quality degradation). Args: days: Number of days of history to analyze (default 7). Returns: Formatted table with model scores, trend arrows, and alerts.
llm_health	Check the health status of all configured LLM providers.
llm_hook_health	Check the health status of all routing hooks. Shows: Hook permission status (executable vs not) Success/error counts Recent errors with timestamps Health status (healthy/degraded/failing)
llm_providers	List all supported providers and which ones are configured.
llm_dashboard	Open the LLM Router web dashboard in the background. Starts a local HTTP server at localhost: showing routing stats, cost trends, model distribution, and recent decisions. Refreshes every 30s. The dashboard reads from the same SQLite DB the router writes — no extra configuration needed. Args: port: TCP port for the dashboard server (default 7337). Returns: URL and instructions for opening the dashboard.
llm_team_report	Show a team savings report for the current user and project. Displays call counts, cost savings, free-tier usage, and top models, broken down for the auto-detected user (git email) and project (git remote). Args: period: `"today"`, `"week"`, `"month"`, or `"all"`.
llm_team_push	Push the team savings report to the configured notification channel. Sends a formatted message to the endpoint set by `LLM_ROUTER_TEAM_ENDPOINT`. Channel is auto-detected from the URL: hooks.slack.com → Slack Block Kit message discord.com/api/webhooks → Discord Embed api.telegram.org/bot* → Telegram MarkdownV2 message anything else → Generic JSON POST Args: period: `"today"`, `"week"`, `"month"`, or `"all"`.
llm_policy	Show the active routing policy and recent policy audit events. Displays the merged policy from all three layers: Org policy (~/.llm-router/org-policy.yaml) User policy (~/.llm-router/routing.yaml) Repo policy (.llm-router.yml) Also shows the last 10 policy enforcement events from the audit log.
llm_digest	Generate a savings digest and optionally send it to a webhook. Formats a savings summary for the given period. Also detects spend spikes and shows a "what if router was off?" simulation. Args: period: `"today"`, `"week"`, `"month"`, or `"all time"`. send: If True, POST the digest to LLM_ROUTER_WEBHOOK_URL.
llm_benchmark	Show routing accuracy benchmarks by task type. Accuracy is computed from llm_rate feedback (thumbs up/down). The more you rate responses with llm_rate, the more accurate this becomes. Also shows an optional community export status if LLM_ROUTER_COMMUNITY=true.
llm_model_eval	Evaluate and benchmark all available local and remote models. Runs a suite of benchmark tasks (reasoning, code) against each available model (Ollama, Codex, APIs) to determine quality, speed, and accuracy. Results are cached for 7 days and used to optimize routing priorities. Can be called manually to force a re-evaluation, or automatically runs once per week during session-end. Returns: Formatted evaluation results with quality scores and latency metrics.
llm_model_usage	Analyze which models are being selected in routing. Shows usage statistics for the last N hours: Top models selected Task type distribution (code/query/analyze/etc) Classification methods used (heuristic/ollama/api/fallback) Individual model success rates with quality feedback Args: hours: Look back this many hours (default: 24) Returns: Formatted usage statistics and analysis.
llm_model_export	Export model tracking data for external analysis. Exports complete routing history to a file for analysis in spreadsheets or data tools (Excel, Python, R, etc.). Args: format: Export format (csv, json). Default: csv Returns: Path to exported file and record count.
llm_session_spend	Show real-time session cost breakdown. Reports spend accumulated since this session started, broken down by model and tool. Fires an anomaly warning if spend exceeds the configured threshold (default $0.50) in under 10 minutes. Returns a formatted summary with per-model costs and anomaly status.
llm_approve_route	Approve or reject a pending high-cost routing decision. Use this when llm_route (or any routing tool) blocked a call because the estimated cost exceeded LLM_ROUTER_ESCALATE_ABOVE. The pending call is stored server-side until you approve or cancel it. Args: approve: True to proceed with the call, False to cancel it. downgrade_to: Optional cheaper model to use instead of the blocked one (e.g. "gemini/gemini-2.5-flash" instead of "openai/o3").
llm_quota_status	Show quota balance across Claude, Gemini CLI, and Codex subscriptions. Monitors three subscription providers to help you understand which quota is being exhausted, and make routing decisions accordingly. The QUOTA_BALANCED profile uses this data to dynamically reorder the routing chain — keeping usage balanced across all three subscriptions. Returns: Formatted quota status with usage percentages and route recommendations.
llm_budget	Show real-time budget pressure for all configured providers (v5.0+). Reads live budget state from the Budget Oracle, which normalises provider quota into a single pressure value (0.0 = fully available, 1.0 = exhausted). Pressure sources by provider type: Local (Ollama, vLLM) — always 0.0 (free, no quota) Claude subscription — max(session_pct, weekly_pct, sonnet_pct) / 100 API-key providers — monthly spend / configured cap (0.0 if no cap) Returns: A formatted budget summary with pressure bars per provider.
llm_gain	Show token savings dashboard (RTK-style). Displays comprehensive token savings metrics across all routing decisions, showing actual costs vs. Opus baseline and efficiency multiplier. Features: Total savings and efficiency multiplier Breakdown by model, complexity, and tool Daily trend analysis Cost comparisons Args: period: Time period to analyze: "today", "week" (default), "month", or "all" Returns: Formatted savings dashboard
llm_share_profile	Share your learned routing profile with the community. Exports ~/.llm-router/learned_routes.json and prepares it for upload to a shared community repository. Useful for publishing routing patterns you've learned that may benefit other llm-router users. Returns: Path to exported profile and upload instructions
llm_import_profile	Import a learned routing profile from community or URL. Imports a shared profile and merges it with your existing learned routes. Community profiles must have confidence >= 2 to be imported (strict validation). Args: url: URL to a profile JSON file (optional; defaults to community latest) Returns: Merge summary and new routes imported
llm_check_usage	Check real-time Claude subscription usage (session limits, weekly limits, extra spend). Shows cached data if available. If no data cached, returns the JS snippet to run via Playwright's browser_evaluate (one call, no page navigation needed). The budget pressure from this data feeds directly into model routing — higher usage = more aggressive downshifting to cheaper models.
llm_update_usage	Update cached Claude usage from the JSON API response. Call this with the result from browser_evaluate(FETCH_USAGE_JS). Accepts the full JSON object from the claude.ai internal API. The cached data is used by llm_classify for real budget pressure instead of token-based estimates. Args: data: JSON response from the claude.ai usage API (via browser_evaluate).
llm_refresh_claude_usage	Refresh Claude subscription usage via the OAuth API — no browser required. Reads the Claude Code OAuth token from the macOS Keychain, calls the Anthropic OAuth usage endpoint, and updates the local usage cache. Requires: Claude Code installed and authenticated on macOS.
llm_codex	Route a task to the local Codex desktop agent (OpenAI). Uses the Codex CLI to run tasks non-interactively. This uses the user's OpenAI subscription (not Claude quota) — ideal as a fallback when Claude limits are tight, or for tasks that benefit from OpenAI's models. Available models: gpt-5.4, o3, o4-mini, gpt-4o, gpt-4o-mini Args: prompt: The task or question to send to Codex. model: OpenAI model to use (default: gpt-5.4).
llm_gemini	Route a task to the local Gemini CLI agent (Google). Uses the Gemini CLI to run tasks non-interactively. This uses the user's Google One AI Pro subscription (not Claude quota) — ideal as a fallback when Claude limits are tight, or for tasks that benefit from Google's Gemini models. Available models: gemini-2.5-flash, gemini-2.0-flash, gemini-3-flash-preview Args: prompt: The task or question to send to Gemini. model: Google model to use (default: gemini-2.5-flash).
llm_setup	Set up and manage API providers, hooks, and routing enforcement. Actions: "status": Show which providers are configured and which are missing "guide": Step-by-step guide to add recommended free/cheap providers "discover": Scan for existing API keys in environment (safe, read-only) "add": Add an API key for a provider (writes to .env file securely) "test": Validate API keys with a minimal call (tests configured or specific provider) "provider": Show details about a specific provider "install_hooks": Install auto-routing hooks globally (every Claude Code session) "uninstall_hooks": Remove auto-routing hooks Args: action: What to do — "status", "guide", "discover", "add", "test", "provider", "install_hooks", or "uninstall_hooks". provider: Provider name (for "add", "test", and "provider" actions). api_key: API key value (for "add" action only). Key is validated before saving.
llm_rate	Rate the last (or a specific) routing decision as good or bad. Stores thumbs-up / thumbs-down feedback in the `routing_decisions` table. Over time this signal can be used to retrain the local classifier so routing choices improve based on your preferences. Args: good: True = routing was a good choice; False = bad choice. decision_id: Row ID to rate. Omit (or pass None) to rate the most recent routing decision. Returns: Confirmation string with the rated decision ID, or an error message.
llm_fs_find	Generate glob/grep commands to find files matching a natural-language description. Routes to Haiku/Ollama so the cheap model does pattern thinking. Claude executes the returned commands with Glob/Grep/Bash. Args: description: What you're looking for, e.g. "all Python files that import sqlite3" or "TypeScript files with TODO comments added in the last week". root: Optional root directory to search in. Defaults to current working directory.
llm_fs_rename	Generate shell commands for a file rename/reorganisation operation. Describe what you want to rename and the cheap model produces the mv/git mv commands. Use `dry_run=True` (default) to get echo-prefixed commands safe to inspect before running. Args: description: What to rename and how, e.g. "rename all _old.py files in src/ to remove the old suffix" or "move all test*.py files from tests/unit/ into tests/". dry_run: When True, commands are prefixed with `echo` for safe review. Set to False to get directly executable commands.
llm_fs_edit_many	Generate bulk edit instructions across multiple files. Extends the `llm_edit` pattern to many files at once: the cheap model reads all target files and returns a JSON array of `{file, old_string, new_string}` edit instructions. Claude applies them mechanically. Use this for cross-file refactors, bulk renames within files, or updating repeated patterns across a module. Args: task: Natural-language description of what to change, e.g. "replace all `import sqlite3` with `import aiosqlite as sqlite3`" or "update the copyright year from 2024 to 2025 in all file headers". files: Explicit list of file paths to process. glob_pattern: Glob pattern to find files (e.g. "src/*/.py"). Use either `files` or `glob_pattern`, not both. max_files: Cap on files processed in one call (default 20). Raise if you need more — but consider splitting into batches for large refactors.
llm_fs_analyze_context	Analyze workspace files to build a routing context summary. Scans key files (package.json, pyproject.toml, go.mod, Cargo.toml, README, open TODOs) and produces a compact semantic summary stored in ~/.llm-router/context_summary.json. Subsequent routing decisions inject this summary into the system prompt so cheap models have workspace context. Call this once at the start of a project session or after major refactors. The summary is automatically used by llm_route and llm_auto — no further action required. Args: path: Workspace root to analyze (default: current directory). max_files: Maximum files to read (default: 20).
agoragentic_task	Execute a task on the Agoragentic capability marketplace. Routes automatically to the best-matching trusted provider. Handles USDC settlement on Base L2 blockchain. Args: task: Task type (e.g., "code_review", "summarization") input_json: Task input as JSON string max_budget_usdc: Maximum spend limit (optional) Returns: Execution result as JSON string
agoragentic_browse	Browse available services on the Agoragentic marketplace. Shows trust-verified providers and their capabilities. Returns: JSON list of available capabilities
agoragentic_wallet	Check Agoragentic wallet balance and status. Returns: Wallet info including balance, chain, and currency
agoragentic_status	Get llm-router agent status on Agoragentic. Shows registration status, available seller slots, listings, etc. Returns: Agent status as JSON

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
`router_status`	MCP resource returning a plain-text snapshot of the router's current state. Includes the active profile, subscription tier, configured provider counts (text and media), optional monthly budget, and per-provider circuit-breaker health status. Returns: A newline-delimited plain-text summary (not markdown).

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ypollak2/llm-router'

If you have feedback or need assistance with the MCP directory API, please join our Discord server