Enables cost tracking and quality-based routing for Google's Gemini models through API integration and budget monitoring.
Integrates with local Ollama instances to detect available models, assess routing intelligence, and compare local inference performance with cloud providers.
Provides real-time spend tracking, usage logging, and cost optimization for OpenAI models like GPT-4o and o1.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ComputeGauge MCPshow me my total AI spend and current budget status"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
@computegauge/mcp
The cost intelligence + agent credibility layer for AI agents. Install once, every session is cost-aware and credibility-building.
npm install -g @computegauge/mcpWhy This Exists
AI agents are powerful but expensive. A single Claude Code session can cost $5-50+. A multi-agent workflow can burn through $100 in minutes. And nobody tells the agent — or the human — until the bill arrives.
ComputeGauge MCP makes agents cost-aware AND gives them a credibility score. Any MCP-compatible agent (Claude, Cursor, Windsurf, custom agents) gets:
Cost Intelligence — Know which model is optimal for every task, track spend in real-time
Agent Credibility — Build a reputation score (0-1000) by making smart decisions. Compete on a leaderboard.
Local→Cloud Routing — Detect when local inference isn't good enough, route to cloud, earn credibility for smart routing
The result: agents that spend 40-70% less, build visible credibility, and know when to route to cloud.
Setup — 30 Seconds
Claude Desktop / Claude Code
Add to ~/.config/claude/claude_desktop_config.json:
{
"mcpServers": {
"computegauge": {
"command": "npx",
"args": ["-y", "@computegauge/mcp"]
}
}
}That's it. Restart Claude. Every conversation now has cost intelligence + credibility tracking.
With Provider API Keys (Enhanced)
{
"mcpServers": {
"computegauge": {
"command": "npx",
"args": ["-y", "@computegauge/mcp"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-...",
"OPENAI_API_KEY": "sk-...",
"COMPUTEGAUGE_BUDGET_TOTAL": "50"
}
}
}
}With Local Inference (Ollama, vLLM, etc.)
{
"mcpServers": {
"computegauge": {
"command": "npx",
"args": ["-y", "@computegauge/mcp"],
"env": {
"OLLAMA_HOST": "http://localhost:11434",
"OLLAMA_MODELS": "llama3.3:70b,qwen2.5:7b,deepseek-r1:14b",
"ANTHROPIC_API_KEY": "sk-ant-...",
"COMPUTEGAUGE_BUDGET_TOTAL": "50"
}
}
}
}Cursor
Add to Cursor MCP settings:
{
"computegauge": {
"command": "npx",
"args": ["-y", "@computegauge/mcp"]
}
}Tools Reference
Agent-Native Tools (use automatically every session)
Tool | When to Call | What It Does | Credibility |
| Before any API request | Returns the optimal model for a task | +8 Routing Intelligence |
| After any API request | Logs the request cost | +3 Honest Reporting |
| Every 5-10 requests | Shows cumulative cost and budget | — |
| After completing a task | Rate how well the model performed | +5 Quality Contribution |
| When curious about quality | View model quality leaderboard | — |
| At session end | Run continuous improvement engine | +15 Quality Contribution |
| For transparency | View rating acceptance/rejection stats | — |
Credibility Tools (the reputation protocol)
Tool | When to Call | What It Does | Credibility |
| Anytime | View your 0-1000 credibility score, tier, badges | — |
| To compete | See how you rank vs other agents | — |
| After local→cloud routing | Report smart routing decision | +70 Cloud Routing |
| Before choosing local vs cloud | Should this task stay local? | — |
| To check local capabilities | View local endpoints, models, hardware | — |
Intelligence Tools (for user questions)
Tool | Description |
| User's total AI spend across all providers |
| Budget utilization and alerts |
| Current pricing for any model |
| Compare costs for specific workloads |
| Actionable cost optimization recommendations |
| Spend trends and anomaly detection |
Resources
Resource | URI | Description |
Config |
| Current server configuration |
Session |
| Real-time session cost data |
Ratings |
| Model quality leaderboard |
Credibility |
| Agent credibility profile + leaderboard |
Cluster |
| Local inference cluster status |
Quickstart |
| Agent onboarding guide |
Prompts
Prompt | Description |
| System prompt that makes any agent cost-aware + credibility-building |
| Generate a quick daily cost report |
| Analyze and optimize a described AI workflow |
Agent Credibility System
Every smart decision earns credibility points on a 0-1000 scale:
Category | How to Earn | Points |
🧠 Routing Intelligence | Using | +8 to +15 per event |
💰 Cost Efficiency | Staying under budget, significant savings | +5 to +30 per event |
✅ Task Success | Completing tasks successfully | +10 to +25 per event |
📊 Honest Reporting | Logging requests, reporting failures honestly | +3 to +10 per event |
☁️ Cloud Routing | Smart local→cloud routing via ComputeGauge | +25 to +70 per event |
⭐ Quality Contribution | Rating models, running improvement cycles | +5 to +15 per event |
Credibility Tiers
Tier | Score | What It Means |
⚪ Unrated | 0-99 | Just getting started |
🥉 Bronze | 100-299 | Learning the ropes |
🥈 Silver | 300-499 | Competent and cost-aware |
🥇 Gold | 500-699 | Skilled optimizer |
💎 Platinum | 700-849 | Elite decision-maker |
👑 Diamond | 850-1000 | Best in class |
Earnable Badges
Badge | How to Earn |
🌱 First Steps | Complete first session |
💰 Cost Optimizer | Save >$10 through smart model selection |
📊 Transparency Champion | Log 50+ requests accurately |
☁️ Smart Router | Successfully route 10+ tasks to cloud |
⭐ Quality Pioneer | Submit 25+ model ratings |
🔥 Streak Master | 20+ consecutive successful tasks |
🥇 Gold Agent | Reach Gold tier (500+ score) |
💎 Platinum Agent | Reach Platinum tier (700+ score) |
👑 Diamond Agent | Reach Diamond tier (850+ score) |
🌐 Hybrid Intelligence | Use both local and cloud models in one session |
Local Cluster Integration
ComputeGauge auto-detects local inference endpoints:
Platform | Environment Variable | Default |
Ollama |
|
|
vLLM |
| — |
llama.cpp |
| — |
TGI |
| — |
LocalAI |
| — |
Custom |
| — |
Set OLLAMA_MODELS="llama3.3:70b,qwen2.5:7b" (comma-separated) to declare available models.
The Local→Cloud Routing Flow
1. Agent calls assess_routing("code_generation", quality="good")
2. ComputeGauge checks: local llama3.3:70b quality for code_generation = 80/100
3. "Good" quality threshold = 78 → Local model is sufficient!
4. Agent uses local model → saves money → earns credibility for honest assessment
OR:
1. Agent calls assess_routing("complex_reasoning", quality="excellent")
2. ComputeGauge checks: local llama3.3:70b quality for complex_reasoning = 78/100
3. "Excellent" quality threshold = 88 → Quality gap of 10 points → Route to cloud!
4. Agent calls pick_model → gets Claude Sonnet 4 → executes → calls route_to_cloud
5. Agent earns +70 credibility points for smart routing decisionHow pick_model Works
The decision engine scores every model across three dimensions:
Quality — Per-task-type scores for 14 task types Cost — Real pricing from 8 providers, 20+ models, calculated per-call (log-scale normalization) Speed — Relative inference speed scores
Priority | Quality | Cost | Speed |
| 20% | 70% | 10% |
| 45% | 35% | 20% |
| 70% | 10% | 20% |
| 25% | 15% | 60% |
Model Coverage
Provider | Models | Tier Range |
Anthropic | Claude Opus 4, Sonnet 4, Sonnet 3.5, Haiku 3.5 | Frontier → Budget |
OpenAI | o1, GPT-4o, o3-mini, GPT-4o-mini | Frontier → Budget |
Gemini 2.0 Pro, 1.5 Pro, 2.0 Flash | Premium → Budget | |
DeepSeek | Reasoner, Chat | Value → Budget |
Groq | Llama 3.3 70B, Llama 3.1 8B | Value → Budget |
Together | Llama 3.3 70B Turbo, Qwen 2.5 72B | Value |
Mistral | Large, Small | Premium → Budget |
Local Models Supported
Model | Quality (general) | Best For |
llama3.3:70b | 79/100 | General tasks, code |
qwen2.5:72b | 81/100 | Code, math, translation |
deepseek-r1:70b | 80/100 | Reasoning, math, code |
deepseek-r1:14b | 68/100 | Budget reasoning |
phi3:14b | 60/100 | Simple tasks |
llama3.1:8b | 58/100 | Classification, simple QA |
mistral:7b | 58/100 | Simple tasks |
Environment Variables
Variable | Required | Description |
| No | URL of ComputeGauge dashboard |
| No | API key for dashboard access |
| No | Session budget limit in USD |
| No | Per-provider monthly budget |
| No | Per-provider monthly budget |
| No | Enables Anthropic provider detection |
| No | Enables OpenAI provider detection |
| No | Enables Google provider detection |
| No | Ollama inference endpoint |
| No | Comma-separated local model names |
| No | vLLM inference endpoint |
| No | GPU name for hardware detection |
| No | VRAM in GB |
| No | Amortized hardware cost/hr |
For Agent Developers
If you're building AI agents (via Claude Agent SDK, LangChain, CrewAI, AutoGen, etc.), ComputeGauge MCP is the easiest way to add cost awareness AND agent credibility:
Zero integration effort — Just add the MCP server to your agent's config
No code changes — The agent discovers 18 tools via MCP protocol automatically
Immediate value —
pick_modelreturns recommendations on first call, credibility tracking starts automaticallySession tracking built-in — Full cost visibility per agent run
Credibility system — Your agent earns a visible reputation score that users can see
Local cluster support — Auto-detect and leverage on-prem inference
Budget guardrails — Warnings when approaching limits
Pattern: Cost-Aware + Credibility-Building Agent Loop
1. Agent receives task
2. Agent calls assess_routing(task_type) → local or cloud?
3. Agent calls pick_model(task_type, priority="balanced")
4. Agent uses recommended model for the task
5. Agent calls log_request(provider, model, tokens)
6. Agent calls rate_recommendation(model, rating, success)
7. If cloud-routed: agent calls route_to_cloud(task_type, reason, model)
8. Every 5 requests, agent calls session_cost()
9. If session cost > 80% of budget, switch to priority="cheapest"
10. At session end: check credibility_profile()This pattern reduces costs by 40-70% while building a credibility score that makes users trust the agent more.
License
Apache-2.0 — Free to use, modify, and distribute.
Links
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.