Budget Governor
Provides budget and rate management for OpenAI's LLM API, allowing agents to check clearance and reconcile costs.
Gvnr
Spend caps, rate coordination, idempotency, and a human-in-the-loop gate for AI agents — enforced before the call, not after the invoice. One MCP endpoint, settled via x402 (USDC on Base). No proxy, no self-hosting, no infrastructure to deploy.
Listed on the Official MCP Registry as dev.gvnr/gvnr.
The problem
Agents cost 10–12× more than estimated in production. System prompts, retry loops, and tool calls multiply fast — a runaway agent can generate a $47,000 bill in 11 days. The usual fix, self-hosting a gateway like LiteLLM, means running infrastructure most developers won't set up.
Gvnr is the hosted alternative: an external authority your agent checks before it spends. Your agent asks "am I clear to make this call?" and acts on the answer.
How it works
Gvnr is not an LLM proxy — your tokens never pass through it, and you pay your model provider directly. Gvnr governs the decision to spend, in two independent meters:
1. The governance-operation quota — what you buy from Gvnr.
Your account holds a balance of governance operations (operations_remaining). One budget_clear burns one op. You top this up with USDC; it's decoupled from your LLM spend. get_balance reports it.
2. The spend envelope — a USD cap you set, per agent.
set_envelope(agent_id, limit_usd, window) gives an agent a daily or per-session USD ceiling. Gvnr tracks estimated spend against it and denies once it's exceeded — this is the runaway guardrail. No dollars move; it's an accounting limit you control.
A budget_clear is approved only when both hold: the account has ops left in the quota, and the agent is under its envelope. The loop:
Your agent calls
budget_clear(agent_id, model, estimated_tokens)before each LLM request.Gvnr checks the op quota and the agent's envelope, and returns
{ approved: true, ... }or{ approved: false, reason }.If denied, your agent skips (or escalates via
request_approval).After the LLM responds, call
reconcile(...)with the real token counts so the envelope tracks actual cost, not the estimate.
Quick start
1. Provision an account
curl -X POST https://gvnr.dev/v1/account
# { "api_key": "bg_...", "account_id": "...", "operations_remaining": 25 }New accounts include 25 free trial ops — enough to run the full loop (set an envelope, budget_clear, reconcile) before you fund anything. Your api_key is the credential for every call; account_id is just an internal reference for support. Account creation is credential-only by design (agent-native) — email is optional, set separately via POST /v1/account/notification-email, and used only for human approval notices.
2. Top up your governance-op quota
Pay-as-you-go: 1,000 ops per $1, any amount (try the whole rail for $1), USDC on Base. Open the pay page, name your amount, and pass your API key:
https://gvnr.dev/pay?usd=1&api_key=bg_YOUR_KEYSend USDC to the address shown and paste your tx hash — ops are credited proportional to the amount received, after on-chain verification. Programmatic clients can submit the hash directly:
curl -X POST \
-H "Authorization: Bearer bg_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"tx_hash":"0x..."}' \
https://gvnr.dev/v1/account/topup-verify
# { "operations_remaining": 1000, "credited_ops": 1000, "credited_usd": 1 }Or settle in one round-trip with an x402 client (Base MCP, AgentKit, x402-fetch) — name your own amount:
POST https://gvnr.dev/v1/account/topup?usd=5 # → 402 challenge → pay → 5,000 ops credited3. Set a spend envelope for your agent
curl -X PUT \
-H "Authorization: Bearer bg_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"agent_id":"my-agent","limit_usd":5,"window":"daily"}' \
https://gvnr.dev/v1/budget/envelope
# { "success": true, "agent_id": "my-agent", "limit_usd": 5, "window": "daily" }4. (optional) Set a rate envelope per (agent, provider, model)
curl -X PUT \
-H "Authorization: Bearer bg_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"agent_id":"my-agent","provider":"anthropic","model":"claude-sonnet-4-6","requests_per_minute":30}' \
https://gvnr.dev/v1/rate/envelope5. Before each LLM request: budget_clear, then rate_check
curl -X POST \
-H "Authorization: Bearer bg_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"agent_id":"my-agent","model":"claude-sonnet-4-6","estimated_tokens":2000}' \
https://gvnr.dev/v1/budget/clear
# { "approved": true, "remaining_usd": 4.994, "operations_remaining": 18999 }
curl -X POST \
-H "Authorization: Bearer bg_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"agent_id":"my-agent","provider":"anthropic","model":"claude-sonnet-4-6"}' \
https://gvnr.dev/v1/rate/check
# { "allowed": true, "requests_remaining_this_minute": 29 }6. (optional) Dedupe retries with idempotency_check
curl -X POST \
-H "Authorization: Bearer bg_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"key":"job-abc-123","ttl_seconds":3600}' \
https://gvnr.dev/v1/idempotency/check
# First call: { "is_first_call": true, "ttl_remaining_seconds": 3600 }
# Replay: { "is_first_call": false, "ttl_remaining_seconds": 3598 }7. After the LLM responds, reconcile against actual usage
curl -X POST \
-H "Authorization: Bearer bg_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"agent_id":"my-agent","actual_input_tokens":1800,"actual_output_tokens":2400}' \
https://gvnr.dev/v1/budget/reconcile
# { "ok": true, "drift_usd": 0.003, "remaining_usd": 4.991, "operations_remaining": 18999 }reconcile adjusts the spend envelope by the drift between your estimate and actual cost (the op quota is untouched). You don't pass the model again — reconcile reuses the one from your prior budget_clear. Anthropic, OpenAI, and Gemini all return usage fields with real token counts — pass those in to keep the envelope honest.
Human-in-the-loop approvals
When an agent hits a denial or a sensitive action, pause for a human instead of failing:
# 1. Open a request — returns an approval_id and a mobile-friendly approval_url
curl -X POST -H "Authorization: Bearer bg_YOUR_KEY" -H "Content-Type: application/json" \
-d '{"agent_id":"my-agent","action_summary":"Spend $40 on a research run","ttl_seconds":3600}' \
https://gvnr.dev/v1/approval/request
# { "approval_id": "...", "approval_url": "https://gvnr.dev/approve/...", "expires_at": ... }
# 2. The human opens approval_url and taps approve / deny (emailed if notification-email is set).
# 3. Your agent polls until the decision lands:
curl -H "Authorization: Bearer bg_YOUR_KEY" \
https://gvnr.dev/v1/approval/check/APPROVAL_ID
# { "decision": "pending" } → "approved" | "denied" | "timeout"The agent proceeds on approved, skips on denied, and handles timeout (no decision before expires_at) however it likes.
MCP setup
Add to Claude Desktop or any MCP-compatible client:
https://gvnr.dev/mcp?api_key=bg_YOUR_KEYClaude Code
claude mcp add gvnr --transport http \
"https://gvnr.dev/mcp?api_key=bg_YOUR_KEY"MCP tools
Tool | Description |
| Check clearance against the op quota + spend envelope; burns one op |
| Create or update an agent's USD spend cap |
| Remaining governance-operation quota ( |
| Apply estimate-vs-actual drift to the spend envelope after the LLM responds |
| Allocate a per-(agent, provider, model) rate share |
| Approve or deny against the rate envelope; returns |
| Dedupe retries on a caller-supplied key; returns |
| Open a human-in-the-loop approval request; returns an |
| Poll an approval: |
REST API
All endpoints except POST /v1/account require Authorization: Bearer bg_YOUR_KEY.
TypeScript users: generate full types from the live OpenAPI spec — npx openapi-typescript@latest https://gvnr.dev/openapi.json -o types/gvnr.d.ts. See TYPESCRIPT.md for the integration pattern (typed fetch, x402 topups, discriminated response unions).
Account & top-up
Method | Path | Description |
|
| Provision account — returns |
|
| Remaining governance-op quota ( |
|
| Public — preset details, USDC address, raw amount |
|
| Submit tx hash → verify on-chain → credit (proportional to amount received) |
|
| x402-gated pay-as-you-go top-up — name your amount (min $1, max $100) |
|
| x402-gated preset top-up (machine clients) |
Budget
Method | Path | Description |
|
| Clearance call — approve or deny |
|
| Apply estimate-vs-actual drift to the envelope |
|
| Create or update agent spend cap |
|
| Read envelope state |
|
| Delete an agent's envelope |
Rate · Idempotency · Approval
Method | Path | Description |
|
| Set a per-(agent, provider, model) RPM share |
|
| Runtime rate check — |
|
| Dedupe on a caller-supplied key — |
|
| Open a human approval request — returns |
|
| Poll decision: |
Clearance response
{ "approved": true, "remaining_usd": 4.994, "operations_remaining": 18999 }{ "approved": false, "remaining_usd": 0, "reason": "envelope_exceeded" }Denial reasons: no_credits (op quota exhausted) · no_envelope (agent has no envelope) · envelope_exceeded
Status codes
A denial is not an HTTP error — it's a 200 with a flag, so check the body, not the status:
Situation | HTTP | Body |
|
|
|
Top-up requires payment |
| x402 challenge (in the |
Missing / invalid API key |
|
|
Per-IP account-creation throttle |
|
|
Bad request body |
|
|
Billing — governance ops, not your tokens
Gvnr charges for governance operations, not LLM usage. Your model tokens are billed by your provider; Gvnr never sees them.
Top-ups are pay-as-you-go at 1,000 ops/$1 in USDC on Base mainnet — name any amount on /pay and ops are credited proportionally after on-chain verification. No minimum, no subscription; the amounts below are just one-tap presets:
Amount | Governance ops | Link |
$1 (trial) | 1,000 |
|
$19 | 19,000 |
|
$39 | 39,000 |
|
$79 | 79,000 |
|
Works with Base MCP, AgentKit, and any x402 client. The 402 challenge follows x402 v2 — payment requirements (network, USDC asset, amount, payTo) are returned in the payment-required response header (the body is empty), so an x402 client settles it automatically; you only hand-parse it if you're rolling your own.
Envelope windows
daily— resets at UTC midnight each daysession— never resets (use for one-shot tasks; caller-managed)
Supported models
Model pricing is a static lookup on the hot path — no external calls. The estimate deducted from the envelope is rate(model) × estimated_tokens ÷ 1,000,000 (output rate for chat models, input rate for embeddings), reconciled to actual afterward. These are the per-million-token rates (USD):
Model | Input | Output |
| $5 | $25 |
| $3 | $15 |
| $1 | $5 |
| $2.50 | $10 |
| $0.15 | $0.60 |
| $10 | $30 |
| $0.02 / $0.13 | input-only |
| $0.15 / $0.20 | input-only |
Embedding / input-only models are billed on input tokens — pass input tokens as estimated_tokens to budget_clear. Unknown models fall back to a conservative $15 / $75 default.
Network
Runs on Base mainnet (X402_NETWORK=eip155:8453), settling real USDC. There's no minimum to try it: top up as little as $1 (1,000 governance ops) to exercise the full live rail end-to-end — no testnet needed.
License
MIT — see LICENSE.
The canonical hosted service is https://gvnr.dev. Self-hosted instances are unaffiliated.
Acknowledgments
gvnr was designed and built by mightbesaad in close partnership with Claude Code (Anthropic) — chiefly Claude Opus 4.8. From the substrate architecture and the billing model to the code in this repository, it was a genuine collaboration. Thank you.
This server cannot be installed
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/mightbesaad/gvnr'
If you have feedback or need assistance with the MCP directory API, please join our Discord server