SF Permits MCP Server

CLAUDE.md•37.4 KiB

# SF Permits MCP Server ## Quick Orientation This is a Python/FastMCP MCP server providing San Francisco building permit data, entity network analysis, AI-powered permit guidance, and AI vision plan analysis via 30 tools + a Flask web UI. **Start here to understand the project:** 1. `README.md` — tools, architecture, setup, project phases 2. `docs/ARCHITECTURE.md` — data flow, DuckDB schema, knowledge base tiers, decision tools, web UI, database architecture 3. `docs/DECISIONS.md` — why we built from scratch, DuckDB over SQLite, entity resolution strategy, SQL-first graph model 4. `CHANGELOG.md` — what was built in each phase (reverse chronological) **Knowledge base documentation (for understanding the curated permitting data):** 5. `data/knowledge/SOURCES.md` — complete inventory of all 47 tier1 JSON files, tier2 raw text, tier3 admin bulletins, tier4 code corpus 6. `data/knowledge/GAPS.md` — known gaps, resolved gaps, Amy interview questions 7. `data/knowledge/INGESTION_LOG.md` — chronological log of all 8 ingestion sessions **External state:** - Chief brain state: `projects/sf-permits-mcp/STATUS.md` (via chief MCP server) ## Project Structure ``` src/ # MCP server code (62 files, ~24K lines) server.py # FastMCP entry point, registers 30 tools soda_client.py # Async SODA API client (httpx) formatters.py # Response formatting for Claude db.py # DuckDB + PostgreSQL dual-mode connections, pool mgmt knowledge.py # KnowledgeBase singleton, semantic index ingest.py # SODA -> DuckDB pipeline entities.py # 5-step entity resolution cascade graph.py # Co-occurrence graph (SQL self-join) validate.py # Anomaly detection queries report_links.py # External links for property reports severity.py # Permit severity scoring v2 station_velocity_v2.py # Station-sum timeline model signals/ # Health signal aggregation tools/ # 30 tool implementations (33 files) vision/ # AI vision modules (Claude Vision API) client.py # Anthropic Vision API wrapper pdf_to_images.py # PDF-to-base64 image conversion prompts.py # EPR check prompts for architectural drawings epr_checks.py # Vision-based EPR compliance checker web/ # Flask + HTMX web UI (44 files, ~25K lines) app.py # Flask app factory, middleware, startup (1,061 lines) routes_public.py # Public search, landing, demo (1,783 lines) routes_search.py # Authenticated search + tools (1,452 lines) routes_cron.py # Cron endpoints, nightly jobs (1,414 lines) routes_admin.py # Admin dashboard, feedback, ops (996 lines) routes_auth.py # Magic-link auth, account mgmt (744 lines) routes_property.py # Property reports, plan analysis (570 lines) routes_api.py # JSON API endpoints (557 lines) routes_misc.py # Health, static pages, misc (511 lines) auth.py # Auth helpers, user management brief.py # Morning brief data assembly report.py # Property report generation helpers.py # run_async, md_to_html, shared utils activity.py # Feedback, bounty points, admin users email_brief.py # Morning brief email delivery email_triage.py # Nightly triage report email delivery regulatory_watch.py # Regulatory watch CRUD + query helpers cost_tracking.py # API cost tracking + rate limiting pipeline_health.py # Permit pipeline monitoring intelligence.py # Activity intelligence templates/ # 77 Jinja2 templates static/ # CSS, JS, PWA manifest, icons data/knowledge/ # 4-tier knowledge base (gitignored tier4) tier1/ # 47 structured JSON files — loaded at startup tier2/ # Raw text info sheets tier3/ # Administrative bulletins tier4/ # Full code corpus (Planning Code 12.6MB + BICC 3.6MB) scripts/ # CLI tools (29 files) tests/ # 3,455 tests (127 files, ~46K lines) datasets/ # SODA dataset catalog (22 datasets, 13.3M records) docs/ # Architecture, decisions, contact data analysis ``` > **Blueprint refactor complete (Sprint 69).** Routes extracted from monolithic `app.py` (~8K lines) into 8 Blueprint files. `app.py` is now 1,061 lines — just app factory, middleware, and startup. ## Key Numbers - **30 tools**: 8 SODA API (Phase 1), 3 Entity/Network (Phase 2), 5 Knowledge (Phase 2.75), 2 Facilitation (Phase 3.5), 2 Vision (Phase 4), 1 Addenda (Phase 5), 2 Severity/Health (Phase 6), 6 Project Intelligence (Phase 7), 1 Similar Projects - **22 SODA datasets**, 13.3M records cataloged - **DuckDB**: 1.8M contacts -> 1M entities -> 576K relationship edges - **PostgreSQL (prod)**: 5.6M rows, 2.05 GB on Railway, 59 tables - **Knowledge base**: 47 tier1 JSON files, 86 semantic concepts, ~817 aliases - **RAG**: 1,035 chunks, hybrid retrieval (pgvector) - **Voice calibration**: 15 scenarios, 7 audiences, 8 situations - **Routes**: 153 (across 8 Blueprint files + app.py) - **Tests**: 3,455 collected, 3,428 passing, 20 skipped - **Scenarios**: 73 approved in scenario-design-guide.md - **Live**: https://sfpermits-ai-production.up.railway.app ## Current State Phases 1-7 substantially complete. Blueprint route refactor complete (Sprint 69) — routes extracted from monolithic `app.py` into 8 Blueprint files (`routes_public.py`, `routes_search.py`, `routes_cron.py`, `routes_admin.py`, `routes_auth.py`, `routes_property.py`, `routes_api.py`, `routes_misc.py`); `app.py` reduced from ~8K to 1,061 lines. Sprint 69 delivered: redesigned landing page, search intelligence with anonymous demo path, /methodology + /about-data + /demo content pages, portfolio/PWA support. Sprint 69 Hotfix: address search resilience — graceful degradation on query timeouts. Sprint 68-A: Scenario governance — 102 scenarios reviewed, 73 in design guide. ## Railway Production Infrastructure **Live URL**: https://sfpermits-ai-production.up.railway.app **Project**: sfpermits-ai (Railway) ### Services | Service | Role | Branch | URL | Status | |---|---|---|---|---| | **sfpermits-ai** | Flask web app (production) | `prod` | sfpermits-ai-production.up.railway.app | Active | | **sfpermits-ai-staging** | Flask web app (staging) | `main` | sfpermits-ai-staging-production.up.railway.app | Active | | **sfpermits-mcp-api** | MCP server over Streamable HTTP (`Dockerfile.mcp`) | `main` | sfpermits-mcp-api-production.up.railway.app | Active | | **sf-permits-mcp** | (verify purpose) | | | Active | | **fantastic-mindfulness** | (verify purpose) | | | Active | | **pgvector-db** | PostgreSQL + pgvector — user data, RAG embeddings, permit changes | — | internal only | Active, primary DB | | **pgVector-Railway** | pgvector instance (appears unused, has empty volume) | — | — | Active | ### Other Railway Projects (same account, Pro plan) | Project | Service | What it is | |---|---|---| | **fortunate-cooperation** | `chief-mcp-server` | Chief brain state MCP server — manages tasks, goals, notes, specs via git-backed state | | **optimistic-mindfulness** | `worker` | Telegram bot | ### MCP Server (sfpermits-mcp-api) **MCP URL**: `https://sfpermits-mcp-api-production.up.railway.app/mcp` **Health**: `https://sfpermits-mcp-api-production.up.railway.app/health` Separate Railway service that exposes the same 30 MCP tools over Streamable HTTP for claude.ai integration. Uses `Dockerfile.mcp` and `src/mcp_http.py`. Requires the same env vars as the main Flask app (`DATABASE_URL`, `ANTHROPIC_API_KEY`, etc.). **Connect from claude.ai**: Settings → Integrations → Add MCP server → paste the MCP URL above. **Important**: Uses `mcp[cli]>=1.26.0` (Anthropic's official package), NOT the standalone `fastmcp` package. The standalone `fastmcp>=2.0.0` produces incompatible protocol responses that claude.ai cannot parse. ### Database (pgvector-db) The app's `DATABASE_URL` points to `pgvector-db.railway.internal:5432` — **only reachable from within Railway's network**, not from local machines. **Tables on pgvector-db:** - User tables: `users`, `auth_tokens`, `watch_items`, `feedback`, `activity_log`, `points_ledger` - Permit tracking: `permit_changes`, `cron_log`, `regulatory_watch` - Vision: `plan_analysis_sessions`, `plan_analysis_images`, `plan_analysis_jobs` - RAG: `knowledge_chunks` (pgvector embeddings, ~1,012 chunks) - Bulk data: `contacts` (1.8M), `entities` (1M), `relationships` (576K), `permits` (1.1M), `inspections` (671K), `timeline_stats` (382K) ### Two-Branch Model (Sprint 54+) | Branch | Purpose | Railway trigger | |--------|---------|----------------| | `main` | Staging — all builds land here first | Auto-deploys staging (if configured) | | `prod` | Production — promoted from main after QA | Auto-deploys `sfpermits-ai` production service | **Promotion ceremony** (after staging QA passes): ```bash git checkout prod && git merge main && git push origin prod ``` **NEVER** push directly to `prod` — always merge from `main` after verification. ### Deploying to Production GitHub auto-deploy is connected: - Pushes to `main` → deploy **sfpermits-ai-staging** (staging) - Pushes to `prod` → deploy **sfpermits-ai** (production) ```bash # Verify staging: curl -s https://sfpermits-ai-staging-production.up.railway.app/health | python3 -m json.tool # Verify production: curl -s https://sfpermits-ai-production.up.railway.app/health | python3 -m json.tool ``` If auto-deploy ever stops working, fallback: `cd /Users/timbrenneman/AIprojects/sf-permits-mcp && railway service link sfpermits-ai && railway up` **DO NOT** use `railway redeploy --yes` — it restarts the old image without rebuilding from new code. ### Interacting with Railway ```bash # CLI basics (must be in project root: /Users/timbrenneman/AIprojects/sf-permits-mcp) railway status # Current project/service/env railway service link <service-name> # Switch active service context railway variable list # Show env vars for linked service railway logs -n 100 # Recent logs for linked service railway deployment list # Recent deployments # You CANNOT connect to pgvector-db from local — it's internal-only. # To check prod DB state, use the /health endpoint: curl -s https://sfpermits-ai-production.up.railway.app/health | python3 -m json.tool # Trigger backup: curl -s -X POST -H "Authorization: Bearer $CRON_SECRET" \ https://sfpermits-ai-production.up.railway.app/cron/backup # Key env vars are on the sfpermits-ai service: # DATABASE_URL, CRON_SECRET, ADMIN_EMAIL, INVITE_CODES, # SMTP_HOST/PORT/FROM/USER/PASS, ANTHROPIC_API_KEY, OPENAI_API_KEY ``` ### Backups See `docs/BACKUPS.md` for full strategy. Key points: - Admin auto-seed: empty `users` table + `ADMIN_EMAIL` env var → admin account created on startup - `POST /cron/backup` — pg_dump of user-data tables (CRON_SECRET auth) - `python -m scripts.db_backup` — local CLI for backup/restore - Railway native backups: enable Daily + Weekly in dashboard → pgvector-db → Settings → Backups ### What's recoverable vs. what needs backups | Data | Source of truth | Recovery | |---|---|---| | Permits, contacts, entities, relationships | SODA API → DuckDB | `python -m src.ingest && python -m src.entities && python -m src.graph` | | Knowledge base (tier1-3) | git (`data/knowledge/`) | Already in repo | | Knowledge base (tier4) | Local files (gitignored, >1MB each) | Manual — keep local copies | | RAG embeddings | pgvector-db `knowledge_chunks` | Re-run `POST /cron/rag-ingest` | | Users, watches, feedback | pgvector-db | **Needs backups** — no external source | ## Development **Always activate the virtual environment first:** ```bash source .venv/bin/activate pip install -e ".[dev]" python -m src.server # MCP server python -m web.app # Web UI pytest tests/ -v # Tests ``` > **Note:** The system Python (Homebrew 3.14) does NOT have project dependencies installed. > Always run `source .venv/bin/activate` before any `python`, `pytest`, or `pip` command. Database regeneration (from SODA API): ```bash python -m src.ingest && python -m src.entities && python -m src.graph && python -m src.validate all ``` ## Branch & Merge Workflow Development uses ephemeral Claude Code worktree branches (auto-created under `.claude/worktrees/`). ### IMPORTANT: Worktree branch close-out (CHECKCHAT requirement) Worktrees live on their own branches (e.g. `claude/sharp-germain`) separate from `main`. **At CHECKCHAT close, always:** 1. Run `git status` from **inside the worktree directory** — not just from the main repo root 2. Commit any modified files on the worktree branch 3. From the main repo root, merge the worktree branch into `main`: `git merge claude/<name>` 4. Push `main` Skipping this leaves uncommitted changes showing in the CC UI ("Commit changes" badge) even after a session that looks otherwise clean. ### Who can push to main directly - **Tim (repo owner):** Merge to `main` locally and push directly — no PRs needed. - **All other contributors (Steven, etc.):** Must open a PR and get Tim's review before merging. See `.github/PULL_REQUEST_TEMPLATE.md` for required QA evidence. ### PR requirements for contributors - Branch from latest `main`, keep PRs small and focused (one feature or fix per PR) - Fill out the PR template completely — "show your work" with screenshots, test output, manual QA steps - Update `CHANGELOG.md` in every PR - All `pytest` tests must pass before requesting review ### Onboarding New developers: see `docs/ONBOARDING.md` for local setup, architecture overview, and coding conventions. ## Deployment Rules **IMPORTANT: Do NOT run `railway up` or any Railway CLI deployment commands.** Deployment is handled automatically by Railway via the GitHub integration whenever code is pushed or merged to the `main` branch. Running `railway up` manually will cancel the GitHub-triggered deployment and cause conflicts. ### What to do instead - Push/commit code to `main` (or merge a PR into `main`) - Railway will automatically detect the change and deploy - Verify with: `curl -s https://sfpermits-ai-production.up.railway.app/health | python3 -m json.tool` ### Forbidden commands - `railway up` — conflicts with GitHub auto-deploy - `railway deploy` — same issue - `railway redeploy --yes` — only restarts old image, doesn't rebuild from new code ## QA + Scenario Capture Protocol This protocol applies at the close of EVERY feature session. Do not skip it. ### Step 1: Generate Cowork QA Script Write a QA script to `qa-drop/[feature-name]-qa.md`. Rules: - Name the file after the feature (e.g., `qa-drop/routing-progress-qa.md`) - **Cross-repo sessions:** If the feature lives in a different repo (e.g., dforge), write the QA script to `qa-drop/` in that repo instead. Same rules apply. - Script must be self-contained — no setup, no credentials, no prior context needed - Structure as numbered steps Cowork can execute sequentially - Each step has an explicit PASS/FAIL criterion - Cover happy path, empty state, and at least one edge case - NO route-specific assertions unless absolutely necessary - NO color/style assertions - If the feature touches templates/CSS, include DESIGN TOKEN COMPLIANCE section (see Design System section) - Output format: compact checklist, not prose ### Step 2: Append Suggested Scenarios Append to `scenarios-pending-review.md` in **this repo's root** (create if missing). Always here, regardless of which repo the feature lives in. Use exactly this format for each scenario: ## SUGGESTED SCENARIO: [short descriptive name] **Source:** [feature or file that prompted this] **User:** expediter | homeowner | architect | admin **Starting state:** [what's true before the action] **Goal:** [what the user is trying to accomplish] **Expected outcome:** [success criteria — no routes, no UI specifics, no colors] **Edge cases seen in code:** [boundary conditions you noticed — optional] **CC confidence:** high | medium | low **Status:** PENDING REVIEW Guidance: - High confidence = core behavior that would break Amy's workflow if missing - Medium confidence = important but might be implementation detail - Low confidence = noticed in code but unsure if intentional product behavior - Aim for 2-5 scenarios per feature session - Never modify `scenario-design-guide.md` directly — that file is reviewed externally ### Step 3: Notify After writing both files, output a single summary line: `QA READY: qa-drop/[filename] | [N] scenarios appended to scenarios-pending-review.md` --- ## Design System This project has a 3-file design system authority chain. **All agents modifying templates or CSS must follow it.** ### Authority Chain 1. `docs/DESIGN_CANON.md` — immutable identity ("why"). Do not modify. 2. `docs/DESIGN_TOKENS.md` — agent-facing bible ("what"). 26 components with copy-paste HTML/CSS. 3. `docs/DESIGN_PRINCIPALS.md` — audiences and constraints ("who/where"). 4. `docs/DESIGN_COMPONENT_LOG.md` — running inventory of new components created during sprints. ### Agent Rules (mandatory for any template/CSS work) Before modifying any file in `web/templates/` or `web/static/`: 1. **Read** `docs/DESIGN_TOKENS.md` — use ONLY components and tokens from this file 2. **No invented colors** — all hex values must come from the token palette (Section 1) 3. **No invented fonts** — `--mono` for data, `--sans` for prose (Section 2, role assignment table) 4. **No invented components** — use token components. If you need one that doesn't exist, append it to `docs/DESIGN_COMPONENT_LOG.md` with the HTML/CSS you created 5. **After building**, verify against the Agent Checklist (DESIGN_TOKENS.md §14) ### Template Migration (existing pages) When an agent touches an existing template for feature work, it should reconcile that template with the design system as part of the task. This means: replace ad-hoc inline colors with token variables, swap non-standard components for token components, ensure font role split is correct. Do not do a separate migration pass — migrate on touch. **Do NOT copy CSS patterns from existing templates** — always reference `docs/DESIGN_TOKENS.md`. Existing templates may predate the design system and contain non-token colors, fonts, and components that should not be propagated. ### Token Lint (`scripts/design_lint.py`) Automated token compliance check. Runs in <5 seconds, no browser needed. ```bash python scripts/design_lint.py --changed --quiet # only git-changed templates python scripts/design_lint.py --files web/templates/brief.html # specific file python scripts/design_lint.py # all templates (baseline) ``` **Agents run `--changed` mode during the BUILD step.** Results logged to `qa-results/design-lint-{agent}.md`. Non-blocking — agents keep building regardless of score. ### Prod Promotion Gate Lint scores determine prod promotion. **Nothing blocks staging — only prod has a gate.** | Score | Staging | Prod | Post-Prod | |---|---|---|---| | **5/5 — clean** | Auto | Auto-promote | Nothing | | **4/5 — minor** | Auto | Auto-promote | Hotfix after prod push | | **3/5 — notable** | Auto | Auto-promote | Mandatory hotfix after prod push | | **2/5 — significant** | Auto | **HOLD** | Tim reviews, hotfix before promote | | **1/5 — broken** | Auto | **HOLD** | Tim reviews, hotfix before promote | **The threshold:** can a user see this and lose trust? Wrong font weight (4/5) — no. Completely off-brand page with random colors (2/5) — yes. **Hotfix session flow (after quad sprint completes):** 1. Orchestrator consolidates lint results from all agents 2. If score ≤ 3: open a design review session 3. CC presents violations + screenshots of affected pages 4. Tim: fix / accept / defer per violation 5. Fixes committed as hotfix, promoted to prod ### QA Script — Token Compliance Checks Every QA script that covers UI changes must include this section: ``` ## DESIGN TOKEN COMPLIANCE - [ ] Run: python scripts/design_lint.py --changed --quiet - [ ] Score: [N]/5 - [ ] No inline colors outside DESIGN_TOKENS.md palette - [ ] Font families: --mono for data, --sans for prose (spot check 3 elements) - [ ] Components use token classes (glass-card, obs-table, ghost-cta, etc.) - [ ] Status dots use --dot-* not --signal-* colors - [ ] Interactive text uses --text-secondary or higher (not --text-tertiary) - [ ] New components logged in DESIGN_COMPONENT_LOG.md ``` ### Pre-Sprint Design Brief (for sprints with UI work) Before any sprint that builds new pages or significant UI changes, run a design brief (10-15 min): 1. List all pages/components being built or modified 2. Check DESIGN_TOKENS.md for coverage — does every needed component exist? 3. Flag gaps: components that don't exist yet need to be specced BEFORE agents start building 4. Get Tim's sign-off on any new patterns or design decisions not covered by CANON/TOKENS 5. Add approved new patterns to DESIGN_TOKENS.md before launching agents This prevents the cycle of: agent invents → review after → rework. ### CHECKCHAT Integration **VERIFY step** — add: "Design token compliance: no ad-hoc colors/fonts/components introduced" **CAPTURE step** — add: "DESIGN_COMPONENT_LOG.md updated if new components were created" **Visual QA Checklist** — add: "Token compliance: [PASS/FAIL] — [N] ad-hoc styles found" ### Governance Cadence - **Every sprint with UI work:** Token compliance checks in QA script + CHECKCHAT - **Every 5 sprints:** Review DESIGN_COMPONENT_LOG.md. Promote good patterns to TOKENS, deprecate ad-hoc ones. Run token usage audit. - **On request:** Full design review against CANON constraints --- ## 11. Chief Hub Protocol This project reports into Chief — a git-backed brain-state system that gives the planning layer (Claude.ai) visibility into what's happening across all projects. Chief is the coordination hub; this CLAUDE.md is the project's local instructions. ### Chief Project Path This project's artifacts live at: ``` chief-brain-state/projects/sf-permits-mcp/ ├── STATUS.md ← project state (synced at session close + nightly) ├── CLAUDE.md.current ← latest working CLAUDE.md (nightly sync) ├── scenarios-pending-review.md ← CC-suggested scenarios awaiting planning review ├── qa-results/ ← QA scripts and results from RELAY sessions └── specs/ ← specs that affect planning decisions ``` **Project slug in Chief:** `sf-permits-mcp` → maps to `projects/sf-permits-mcp/` in chief-brain-state ### What Gets Pushed to Chief **At session close (CHECKCHAT step 4 — SHIP):** - STATUS.md updates (via `chief_write_file`) - New scenarios from `scenarios-pending-review.md` (via `chief_write_file`) - QA scripts from `qa-drop/` (via `chief_write_file` to `projects/sf-permits-mcp/qa-results/`) - Task/goal updates (via `chief_add_task`, `chief_complete_task`, `chief_add_goal`) - Session notes (via `chief_add_note`) **Nightly (automated):** - `CLAUDE.md` → pushed as `CLAUDE.md.current` - `scenarios-pending-review.md` (if changed) - Any new files in `qa-drop/` not yet in Chief - `git diff --stat` since last nightly sync → pushed as `nightly-diff.md` - STATUS.md / CHANGELOG.md if changed ### What Chief Knows About This Project The planning layer (Claude.ai with Chief MCP) can read: - Current project state without needing repo access - Pending scenarios that need review and approval - QA results and coverage gaps - What changed since last planning session (via nightly diff) - Whether RELAY/CHECKCHAT protocols are being followed ### Required Project Artifacts | Artifact | Purpose | Status | |----------|---------|--------| | `qa-drop/` | Directory for RELAY QA script output | ✅ exists | | `qa-results/` | Directory for completed/reviewed QA scripts | ✅ exists | | `scenarios-pending-review.md` | CC appends suggested scenarios here | ✅ exists | | `STATUS.md` | Project state — read by Chief nightly | ✅ via Chief | --- ## 12. Session Protocols This project participates in Tim's standard session protocols. These are defined in `~/.claude/CLAUDE.md` (the global file) and activated per-project by the markers below. ### Protocol Markers **RELAY** — QA loop. After building, CC runs QA scripts using **Playwright headless Chromium** for any step involving page navigation or UI rendering. Do NOT substitute pytest or curl for browser verification — launch a real browser, navigate pages, take screenshots to `qa-results/screenshots/`. CLI-only steps (imports, DB queries, pytest) can use Python/bash directly. Loops until all tests PASS or are marked BLOCKED. New QA scripts go to `qa-drop/`. **CHECKCHAT** — Session close protocol for **solo sessions** (single agent, design QA, hotfix). Six steps: VERIFY, DOCUMENT, CAPTURE, SHIP, PREP NEXT, BLOCKED ITEMS REPORT. **CHECKQUAD** — Session close protocol for **quad sprint terminals** (T1-T4). Lighter than CHECKCHAT, artifact-first. **Step 0: ESCAPE CWD** — `cd` to main repo root BEFORE the MERGE step. Then: MERGE, ARTIFACT (session report), CAPTURE (per-terminal files), HYGIENE CHECK, SIGNAL DONE. Delegates documentation/shipping to T0. See dforge `swarm-coordination` template for full protocol. **CHECKQUAD-T0** — Session close protocol for **quad sprint orchestrator** (T0). Heavier than CHECKCHAT, consolidates all terminals. **T0 NEVER enters a worktree** — it operates from main repo root throughout. Eight steps: COLLECT, VERIFY, VISUAL QA, CONSOLIDATE, DOCUMENT (includes Chief task drain), HARVEST (dforge lessons), SHIP+PROMOTE, CLEAN. **CLEAN is always the final step** — runs after all sessions are closed and prod is promoted. **Black Box Session Protocol (2 stages):** **Stage 1 — termCC (Terminal Claude Code):** READ → BUILD → TEST → SCENARIOS → QA (termRelay) → session close. Solo sessions use CHECKCHAT. Quad sprint terminals use CHECKQUAD. T0 orchestrator uses CHECKQUAD-T0. **Stage 2 — DeskCC (Desktop Claude Code):** DeskRelay visual checks → CHECKCHAT (lightweight). QA is not optional. Scenarios are not optional. Session close protocol is not optional. > See `~/.claude/CLAUDE.md` for the full protocol definitions. This section activates them. > See `BLACKBOX_PROTOCOL.md` for the full Black Box session structure and DeskRelay prompt generation rules. > See `DEPLOYMENT_MANIFEST.yaml` for all URLs, topology, and deployment commands. ## Black Box Protocol: active ## Deployment Manifest: DEPLOYMENT_MANIFEST.yaml ## RELAY: active ## CHECKCHAT: active ## CHECKQUAD: active (see dforge swarm-coordination template) --- ## Swarm Orchestration Rules This project uses multi-agent swarm builds. Two execution models depending on sprint size: ### Standard Sprint (4 agents, 1 terminal) For focused sprints with 4 or fewer agents. One CC terminal (Opus) spawns agents via Task tool. ``` CC0 (Opus orchestrator) ├── Pre-flight: git pull, verify prod state ├── Spawn Agent A ──► Task(subagent_type="general-purpose", model="sonnet", isolation="worktree") ├── Spawn Agent B ──► Task(subagent_type="general-purpose", model="sonnet", isolation="worktree") ├── Spawn Agent C ──► Task(subagent_type="general-purpose", model="sonnet", isolation="worktree") ├── Spawn Agent D ──► Task(subagent_type="general-purpose", model="sonnet", isolation="worktree") │ (all 4 run in parallel) ├── Collect results from all agents ├── Merge worktree branches in dependency order ├── Single test run (Fast Merge Protocol) ├── Push to main └── Report summary table ``` **Key parameters for each Task call:** - `subagent_type: "general-purpose"` — full tool access including Bash, Read, Write, Edit, Grep, Glob - `model: "sonnet"` — build agents use Sonnet for execution speed - `isolation: "worktree"` — each agent gets an isolated git worktree copy of the repo - `prompt:` — self-contained build instructions (agent rules, read list, tasks, test/QA/scenarios/ship) **Agent prompts must include this preamble:** ``` You are ALREADY in a git worktree. Do NOT use EnterWorktree. Do NOT run git checkout main. Your working directory is your isolated worktree copy of the repo. If you modify any file in web/templates/ or web/static/, read docs/DESIGN_TOKENS.md first. Use ONLY token components and CSS custom properties. Log new components to docs/DESIGN_COMPONENT_LOG.md. ``` **Agents commit to their worktree branch. The orchestrator merges all branches to main after collecting results.** Agents must NEVER merge to main themselves. ### Quad Sprint (16 agents, 4 terminals) — THE STANDARD FOR LARGE SPRINTS For sprints with 8+ tasks spanning multiple domains. 4 CC terminals × 4 agents each. Tim operates T0 (orchestrator terminal) which launches and coordinates T1-T4. ``` T0 (Tim — orchestrator, no agents, NEVER in a worktree) ├── Pre-flight: verify clean state, tests, prod health, baselines │ └── Stale worktree check: git worktree list → prune → delete merged branches ├── Launch T1-T4 by pasting prompts into 4 CC terminals ├── Monitor: all agents FOREGROUND, watch for failures ├── After all terminals finish: │ ├── Verify each terminal pushed to main │ ├── File ownership audit (diff against ownership matrix) │ ├── Full test suite (one run, not per-terminal) │ ├── Prod gate: python scripts/prod_gate.py --quiet │ └── Promote: git checkout prod && git merge main && git push origin prod ├── Report to Chief └── CLEAN (always last): ├── git worktree list → report count ├── git worktree prune ├── git branch --merged main | grep worktree → delete ├── git branch --no-merged main | grep worktree → report (do NOT delete) └── Report: "CLEAN: [N] worktrees pruned, [M] branches deleted, [K] unmerged reported" T1 (CC Terminal 1 — e.g., infrastructure) — spawns 4 agents, merges internally T2 (CC Terminal 2 — e.g., public templates) — spawns 4 agents, merges internally T3 (CC Terminal 3 — e.g., auth templates) — spawns 4 agents, merges internally T4 (CC Terminal 4 — e.g., tests + docs) — spawns 4 agents, merges internally ``` **Merge order:** Terminals merge to main in dependency order (infrastructure → foundation → consumers → tests). Each terminal handles its own 4-agent internal merge. T0 verifies cross-terminal file ownership after all pushes. **Cross-terminal interface contracts:** When one terminal's output is consumed by another, document the interface in BOTH terminal prompts. Use merge order as the dependency resolver. **Sprint prompt files:** ``` sprint-prompts/qsN-t0-orchestrator.md ← Tim reads this sprint-prompts/qsN-t1-[theme].md ← Paste into CC Terminal 1 sprint-prompts/qsN-t2-[theme].md ← Paste into CC Terminal 2 sprint-prompts/qsN-t3-[theme].md ← Paste into CC Terminal 3 sprint-prompts/qsN-t4-[theme].md ← Paste into CC Terminal 4 ``` > Full quad sprint template with T0 orchestrator role, merge ceremony, prod gate integration, failure recovery, and report template: see dforge `swarm-coordination.md` template. ### Swarm Sprint Prompt Structure Standard sprints use TWO prompt types, quad sprints use FIVE (T0 + T1-T4): 1. **`sprint-prompts/qsN-swarm.md`** (standard) or **`qsN-t0-orchestrator.md`** (quad) — The master orchestrator prompt. 2. **`sprint-prompts/qsN-X-*.md`** (standard) or **`qsN-tN-*.md`** (quad) — Per-terminal/agent prompts. The swarm prompt contains the full agent instructions inline (not file references) so each Task call is self-contained. ### QA Protocol Naming - **termRelay** — Automated QA via headless Playwright in Terminal CC. Runs persona-based browser checks, captures screenshots, reports PASS/FAIL. No human needed. - **DeskRelay** — Visual QA escalation via Desktop CC. Only triggered when termRelay finds checks requiring human visual judgment. Typically ≤10 checks per sprint. - **CHECKCHAT** — Session completion summary written by each build agent. Includes a "Visual QA Checklist" section listing items for human spot-check. ### Domain Parallel Patterns Spawn parallel subagents when work spans independent file domains. File ownership tables go in both the swarm prompt and per-agent prompts. **Critical rule:** Parallel agents ONLY work when they touch different files. The orchestrator validates file ownership after completion. ### Pre-Flight: Codebase Audit Before writing sprint prompts, audit the actual code — not stale specs. Verify each assigned task creates something that DOES NOT ALREADY EXIST. Stale specs produce empty sprints. ### Shared File Protocol When multiple agents must touch the same file, prefer function-level interface contracts over section-comment protocols. Specify: which agent owns which function, who adds vs modifies, who merges first. ### Sequential Dependencies Merge order follows the dependency graph: infrastructure first, features second, UX/tests last. ### Fast Merge Protocol (QS4+) **Merge all agents at once. Run the full test suite ONCE at the end.** Do NOT run between each merge — agents already ran the suite on their branches, and with clean file ownership the intermediate runs add ~7 min each with near-zero diagnostic value. If tests fail after all merges, bisect by reverting the last merge and re-testing (still faster than sequential). Only fall back to sequential test runs when file ownership is violated (2+ agents modify the same production file). **Sprint sizing:** 8-10 tasks per agent, estimate 15-30 min per agent (not 3-5 tasks and 2-3 hours). ### Model Routing - Orchestrator: Opus (strategic reasoning, conflict resolution) - Build agents: Sonnet (execution, code generation, testing) - Routing is handled by `model: "sonnet"` parameter in Task calls. No env var needed. ### Session Bootstrap (fallback for manual per-agent prompts) Only needed in the per-agent `qsN-X-*.md` files (not the swarm prompt). Handles paste-into-CC-terminal use: ``` ## SETUP — Session Bootstrap 1. cd /Users/timbrenneman/AIprojects/sf-permits-mcp # escape any old worktree 2. git checkout main && git pull origin main # get latest code 3. EnterWorktree with name `sprint-NN-agent` # create fresh worktree ``` ### Black Box Protocol (v1.3) **Stage 1 — termCC (Terminal Claude Code):** For swarm sprints, the orchestrator (CC0) spawns all build agents in parallel via Task tool. Each agent independently follows: READ → SAFETY TAG → BUILD → TEST → SCENARIOS → QA (termRelay) → VISUAL REVIEW → session close (CHECKQUAD for quad sprint terminals, CHECKCHAT for solo sessions). After all agents complete, the orchestrator runs CHECKQUAD-T0: COLLECT → VERIFY → VISUAL QA → CONSOLIDATE → DOCUMENT → HARVEST → SHIP+PROMOTE → CLEAN. **Visual Review (Phase 6.5):** After Playwright screenshots, run automated visual scoring. Use `scripts/visual_qa.py` (preferred) or send screenshots to Claude Vision. Score each page 1-5. ≥3.0 = PASS. ≤2.0 = escalate to DeskRelay. This is standard, not optional. CHECKCHAT output includes visual scores and a Visual QA Checklist section for any pages scoring ≤2.0. **Stage 2 — DeskCC (Desktop Claude Code):** DeskRelay visual checks → CHECKCHAT Both stages always end with CHECKCHAT. Stage 2 CHECKCHAT is lightweight (commit QA results, note follow-ups, no code changes expected). --- ## 13. Enforcement Hooks Five hooks in `.claude/hooks/` enforce Black Box Protocol compliance. They are configured in `.claude/settings.json`. **Do NOT disable or modify hooks without Tim's explicit approval.** ### Hook Summary | Hook | Event | Purpose | Exit Code | |------|-------|---------|-----------| | `stop-checkchat.sh` | Stop | Blocks CHECKCHAT/CHECKQUAD without screenshots, QA results, and scenarios | 2 = block | | `plan-accountability.sh` | (called by stop hook) | Audits descoped/blocked items for evidence | 1 = fail | | `block-playwright.sh` | PreToolUse:Bash | Forces Playwright execution into QA subagents | 2 = block | | `detect-descope.sh` | PostToolUse:Write | Warns on descoping language in QA/CHECKCHAT files | 0 (warning only) | | `test-hygiene-hook.sh` | PostToolUse:Write | Warns on anti-patterns in test files (env leaks, dual imports) | 0 (warning only) | ### How They Work **CHECKCHAT Pre-flight Gate (stop-checkchat.sh):** When the agent writes `## CHECKCHAT` (H2 header), the Stop hook checks for: 1. PNG screenshots in `qa-results/screenshots/` (verified with `file` magic bytes) 2. A results file matching `qa-results/*-results.md` with PASS/FAIL lines 3. Changes to `scenarios-pending-review.md` (via `git diff`) 4. Plan accountability (no undocumented descopes or unsubstantiated BLOCKED items) Missing evidence → exit 2 (blocks the stop). Agent gets one retry (`stop_hook_active` bypass). **Build/Verify Separation (block-playwright.sh):** Detects Playwright execution commands (`chromium.launch`, `page.goto`, `page.screenshot`, etc.) in Bash calls and blocks them in the main agent. QA subagents are allowed through via `CLAUDE_SUBAGENT=true` or nested worktree CWD detection. `pytest`, `pip install`, and other safe commands are explicitly allowed. **Descope Warning (detect-descope.sh):** Soft warning when writing files to `qa-results/` or CHECKCHAT content containing descoping language. Warns via stderr but does not block. ### Claude Code Hooks API Reference - **Exit 0:** Action proceeds - **Exit 2:** Action blocked — reason written to stderr is shown to the agent - **Any other exit code:** Action proceeds, stderr logged but not shown - All hooks receive JSON on stdin. Key fields: `last_assistant_message` (Stop), `tool_input` (PreToolUse/PostToolUse)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tbrennem-source/sf-permits-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CLAUDE.md•37.4 KiB