optimize_ai_spend
Analyze your AI/LLM spend and get a ranked, dollar-quantified plan to cut costs across OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, and Vertex.
Instructions
Ranked, dollar-quantified plan to cut your AI/LLM bill, across OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, and Vertex.
This is the OpenRouter question answered as analysis, not a proxy: the cheapest way to get the same output. It decomposes spend into its real driver (model choice vs token size vs request volume) and returns the levers ranked by monthly dollars saved:
Model routing: move lower-complexity calls to a cheaper sibling model (priced from real input/output ratios, not a guessed percentage)
Prompt caching: raise your Anthropic cache hit rate so repeated input bills at ~10% of price
Output reduction: trim verbose responses (output is the pricier side)
Error reduction: stop paying for failed requests
Model consolidation: collapse model sprawl into clear tiers
Only levers with a grounded basis carry a savings number; governance levers are listed without inflating the headline. Output-trim savings are skipped for any model that already has a routing recommendation, so nothing is counted twice. nable never sits in your request path; it reads, ranks, and can open the PR.
Args: days: Lookback window in days (default 30). Savings are normalized to a 30-day month.
Examples: - "How do I cut our AI bill?" - "Where is the waste in our LLM spend?" - "What's the cheapest way to run the same workloads?" - "Optimize our token and model costs."
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| days | No |