Claude Team MCP Server

unified-worker-state.md•6.95 KiB

# Unified Worker State: list_workers recovery + worker_events API Status: Proposed Date: 2026-01-31 Issue: cic-bbd ## Context Today we have two sources of worker state: - `SessionRegistry` (in-memory, cleared on restart) drives `list_workers`. - `events.jsonl` (persistent) stores snapshots + transitions from `WorkerPoller` in `src/claude_team/poller.py`, with helpers in `src/claude_team/events.py`. After restart, `list_workers` returns empty even though workers may still exist. External consumers resort to parsing `events.jsonl` because MCP exposes no event API. ## Goals - `list_workers` should return a useful view after restart by recovering from the latest persisted events. - Expose event log data via an MCP tool (`worker_events`) with a stable response schema for consumers. - Keep changes additive and avoid breaking existing client expectations. ## Non-Goals - Perfect real-time accuracy after restart (terminal liveness still requires backend adoption). - Changing polling cadence or event log format. - Backfilling old historical events beyond what exists in `events.jsonl`. ## Part 1: list_workers recovery API surface ### Proposed recovery entry point Add a registry-level recovery API that merges the event log into the registry state without overwriting live sessions. Suggested API shape (names illustrative): - `SessionRegistry.recover_from_events(snapshot: dict | None, events: list[WorkerEvent]) -> RecoveryReport` - **Input:** - `snapshot`: output of `get_latest_snapshot()` (may be `None`). - `events`: `read_events_since(snapshot_ts)` (may be empty). - **Behavior:** - If a session already exists in the registry, do not override it. - If a session is only in the event log, create a lightweight recovered entry. - If a session is closed by events, mark it closed in recovered state. - **Output:** - `RecoveryReport` with counts (added, updated, ignored) and timestamp used. ### Recovered session representation Recovered entries should be distinguishable and safe for read-only usage. Proposed interface (implementation can vary): - A new lightweight `RecoveredSession` object that implements: - `session_id`, `name`, `project_path`, `terminal_id`, `agent_type` (from snapshot) - `status` mapped from event state (see mapping below) - `last_activity` / `created_at` from snapshot when available - `to_dict()` for MCP output - `is_idle()` returns `None` or uses snapshot state only (never touches JSONL) - `SessionRegistry.list_all()` returns a merged list of: - live `ManagedSession` objects, plus - recovered entries not present in the registry ### State mapping Event log snapshots record: - `state`: `"idle"` or `"active"` (from `detect_worker_idle`) - `status`: `"spawning" | "ready" | "busy"` (from `ManagedSession.to_dict()`) Recommended mapping rules: - Prefer snapshot `state` for consistency across restarts. - Map `state` -> `SessionStatus` for output: - `idle` -> `ready` - `active` -> `busy` - `closed` -> (new virtual state or keep `busy` + `state="closed"`) To preserve backwards compatibility, keep the existing `status` field but add new fields so clients can detect recovery state explicitly: - `source`: `"registry" | "event_log"` - `event_state`: `"idle" | "active" | "closed"` (when recovered) - `recovered_at`: ISO timestamp when recovery occurred - `last_event_ts`: ISO timestamp of the last applied event ### Recovery timing Two compatible entry points: 1. **Eager (startup):** in server boot, call recovery once and seed the registry. 2. **Lazy (first list):** in `list_workers`, if registry is empty, perform recovery then return merged output. Recommendation: **eager** recovery at startup for predictable behavior, plus a lazy fallback in `list_workers` for safety if startup recovery fails. ### Tradeoffs (list_workers recovery) - **Pros:** `list_workers` no longer empty after restart; preserves metadata and session IDs for monitoring tools. - **Cons:** recovered entries may be stale; terminal handles are missing, so control actions (send/close) still require adoption. - **Risk mitigation:** mark `source=event_log` and include `last_event_ts` to communicate staleness to clients. ## Part 2: worker_events MCP tool API surface ### Proposed tool signature Tool name: `worker_events` Parameters: - `since` (string | null): ISO 8601 timestamp; returns events at or after this time. If omitted, returns most recent events (bounded by `limit`). - `limit` (int, default 1000): maximum number of events returned. - `include_snapshot` (bool, default false): if true, include the latest snapshot event (even if it predates `since`) in the response. - `include_summary` (bool, default false): include summary aggregates. - `stale_threshold_minutes` (int, default 10): used only when `include_summary=true` to classify “stuck” workers. ### Proposed response shape ``` { "events": [ {"ts": "...", "type": "snapshot|worker_started|worker_idle|worker_active|worker_closed", "worker_id": "...", "data": { ... }} ], "count": 123, "summary": { "started": ["id1", "id2"], "closed": ["id3"], "idle": ["id4"], "active": ["id5"], "stuck": ["id6"], "last_event_ts": "..." }, "snapshot": { "ts": "...", "data": {"count": 2, "workers": [ ... ]} } } ``` ### Summary semantics - **started/closed/idle/active** lists come from the returned event window. - **stuck** is derived from the latest known state (snapshot + events) where: - worker is `active`, and - last activity is older than `stale_threshold_minutes`. - **last_event_ts** is the newest event timestamp in the response. This aligns with the intent of the former `poll_worker_changes` output while exposing the raw events for richer client-side handling. ### Tradeoffs (worker_events) - **Pros:** simple API around existing persistence; consumers can poll with a timestamp cursor instead of parsing JSONL. - **Cons:** no stable event IDs; clients should track the last timestamp and may receive duplicates if multiple events share the same timestamp. - **Mitigation:** include `last_event_ts` and recommend clients request `since=last_event_ts` and de-duplicate by `(ts, type, worker_id)`. ## Open Questions - Do we want a new explicit `SessionStatus.CLOSED` for recovered entries, or is `status` plus `event_state="closed"` sufficient? - Should recovery include an opt-in `include_closed` flag to hide sessions that have closed since the last snapshot? - Should `worker_events` support an optional `project_filter` (parity with `list_workers`)? ## Recommendation Implement recovery as an additive merge from `events.get_latest_snapshot()` plus `events.read_events_since(snapshot_ts)`, surfaced via a registry recovery helper and a new `RecoveredSession` type. Add explicit `source` and `event_state` fields in `list_workers` output to communicate provenance and staleness. Expose a new `worker_events` MCP tool with a minimal `since/limit` API and an optional summary section for consumers that want quick status deltas.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Martian-Engineering/claude-team'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

unified-worker-state.md•6.95 KiB