pod_check_runaway
Identifies GPU pods that exceed time or cost limits, preventing runaway charges by catching forgotten resources at session start.
Instructions
Return locally-tracked pods that have run past max_lifetime_hours OR whose accumulated cost is approaching PRIME_MAX_TOTAL_USD.
Call this at the start of long-running sessions to catch forgotten pods.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- The handler for the pod_check_runaway tool. Iterates locally-tracked pods, checks if they exceed max_lifetime_hours or 80% of PRIME_MAX_TOTAL_USD, and returns a list of RunawayPod dicts.
@mcp.tool async def pod_check_runaway() -> list[dict[str, Any]]: """Return locally-tracked pods that have run past max_lifetime_hours OR whose accumulated cost is approaching PRIME_MAX_TOTAL_USD. Call this at the start of long-running sessions to catch forgotten pods. """ runaways: list[RunawayPod] = [] cap_total = max_total_usd() for tp in list_tracked(): elapsed_hours = (time.time() - tp.started_at_unix) / 3600.0 spend = elapsed_hours * tp.hourly_usd reasons = [] if elapsed_hours > tp.max_lifetime_hours: reasons.append( f"running for {elapsed_hours:.2f}h, declared max_lifetime_hours=" f"{tp.max_lifetime_hours}" ) if spend > cap_total * 0.8: reasons.append( f"estimated spend ${spend:.2f} is >80% of PRIME_MAX_TOTAL_USD=${cap_total:.2f}" ) if not reasons: continue runaways.append( RunawayPod( pod_id=tp.pod_id, name=tp.name, hourly_usd=tp.hourly_usd, started_at_unix=tp.started_at_unix, elapsed_hours=elapsed_hours, estimated_spend_usd=spend, max_lifetime_hours=tp.max_lifetime_hours, reason="; ".join(reasons), ) ) return [r.model_dump() for r in runaways] - Pydantic model defining the output schema for each runaway pod returned by pod_check_runaway.
class RunawayPod(BaseModel): """A pod we tracked locally that has either run past its declared max_lifetime or burned more than 80% of PRIME_MAX_TOTAL_USD.""" pod_id: str name: str | None = None hourly_usd: float started_at_unix: float elapsed_hours: float estimated_spend_usd: float max_lifetime_hours: int reason: str # e.g. "exceeded max_lifetime_hours" or "approaching total cap" suggestion: str = ( "Consider terminating with pod_terminate(pod_id, confirm=True) " "if you no longer need this pod." ) - src/prime_intellect_mcp/server.py:503-503 (registration)The @mcp.tool decorator registers pod_check_runaway as an MCP tool on the FastMCP server instance.
async def pod_check_runaway() -> list[dict[str, Any]]: - Helper function that reads the PRIME_MAX_TOTAL_USD environment variable, used by pod_check_runaway to compute the 80% spending threshold.
def max_total_usd() -> float: raw = os.getenv("PRIME_MAX_TOTAL_USD") if raw is None: return DEFAULT_MAX_TOTAL_USD try: return float(raw) except ValueError: return DEFAULT_MAX_TOTAL_USD - Helper function that returns all locally-tracked pods from state.json, iterated by pod_check_runaway to find runaway pods.
def list_tracked() -> list[TrackedPod]: with _lock: data = _read() return [TrackedPod(**v) for v in data.values()]