SF Permits MCP Server

ARCHITECTURE.md•18.1 KiB

# Architecture ## System Overview ``` Claude (claude.ai / Claude Code) | | MCP tool calls v SF Permits MCP Server (FastMCP) — 21 tools | |--- Phase 1 tools (8) -------> data.sfgov.org SODA API (live HTTP) | | | v | JSON response -> formatters.py -> Claude | |--- Phase 2 tools (3) -------> PostgreSQL / DuckDB (entities, relationships) | | | v | SQL query results -> Claude | |--- Phase 2.75 tools (5) ----> Knowledge Base (data/knowledge/tier1/*.json) | + PostgreSQL/DuckDB (historical statistics) | | | v | Decision tree walk -> predictions -> Claude | |--- Phase 3.5 tools (2) ----> PostgreSQL + Knowledge Base | | | v | Consultant recommendations, permit lookup -> Claude | |--- Phase 4 tools (2) ------> Claude Vision API (Anthropic) | | | v | Plan analysis + EPR compliance -> Claude | |--- Phase 5 tools (1) ------> PostgreSQL / DuckDB (addenda routing, 3.9M rows) | v Plan review routing search -> Claude Flask + HTMX Web UI (Railway) — https://sfpermits-ai-production.up.railway.app | |--- PostgreSQL (pgvector-db) -- users, auth, RAG, vision sessions, permit tracking |--- SODA API (proxied) -------- live permit queries for web users |--- Claude Vision API --------- async plan analysis jobs ``` Phase 1 tools (`search_permits`, `get_permit_details`, `permit_stats`, `search_businesses`, `property_lookup`, `search_complaints`, `search_violations`, `search_inspections`) query data.sfgov.org live via `SODAClient`. Phase 2 tools (`search_entity`, `entity_network`, `network_anomalies`) query PostgreSQL (production) or DuckDB (local) containing 1.8M+ resolved contact records, entity relationships, and anomaly detection results. Phase 2.75 tools (`predict_permits`, `estimate_timeline`, `estimate_fees`, `required_documents`, `revision_risk`) walk a 7-step permit decision tree backed by structured knowledge files (fee tables, routing matrix, OTC criteria, fire/planning code) plus historical statistics. Phase 3.5 tools (`recommend_consultants`, `permit_lookup`) combine knowledge base consultant data with permit lookups. `permit_lookup` uses exact street name matching (with space-variant support), historical lot discovery across condo conversions, and parcel-level merge to return comprehensive results for multi-unit buildings. Phase 4 tools (`analyze_plans`, `validate_plans`) use the Claude Vision API to analyze architectural drawings and check EPR compliance. Phase 5 tool (`search_addenda`) searches 3.9M+ addenda routing records in the local database for plan review status by permit, station, reviewer, department, or date range. The `permit_lookup` tool also includes plan review routing data from the addenda table. --- ## Data Flow ``` SODA API (data.sfgov.org) | | python -m src.ingest (paginated fetch, 10K/page) v DuckDB (data/sf_permits.duckdb) | | python -m src.entities (5-step cascade) v Entity Resolution (contacts -> entities table) | | python -m src.graph (self-join + aggregation) v Co-occurrence Graph (relationships table) | | python -m src.validate (queries + anomaly detection) v Validation Results / MCP Tool Responses ``` ### Ingestion (`src/ingest.py`) Fetches 6 datasets from SODA API into DuckDB: | Dataset | Endpoint | Target Table | Records | |---|---|---|---| | Building Permits Contacts | `3pee-9qhc` | `contacts` | ~1.0M | | Electrical Permits Contacts | `fdm7-jqqf` | `contacts` | ~340K | | Plumbing Permits Contacts | `k6kv-9kix` | `contacts` | ~503K | | Building Permits | `i98e-djp9` | `permits` | ~1.28M | | Building Inspections | `vckc-dh2h` | `inspections` | ~671K | | Building Permit Addenda + Routing | `87xy-gk8d` | `addenda` | ~3.9M | Contact records are normalized into a unified schema during ingestion. Key normalizations: - `license1` (building) mapped to `license_number` - `sf_business_license_number` mapped to `sf_business_license` - `first_name` + `last_name` concatenated into `name` - `company_name` (electrical) and `firm_name` (plumbing) mapped to both `name` and `firm_name` - Roles normalized via `ROLE_MAP` (11 building roles -> 8 canonical types) - `estimated_cost` cast from TEXT to DOUBLE Uses existing `SODAClient` from `src/soda_client.py` for all API calls. Pagination at 10K records per page via `$offset` + `$limit`. ### Entity Resolution (`src/entities.py`) Resolves 1.8M+ contact records into deduplicated entities using a priority cascade: ``` Step 1: pts_agent_id (building only, high confidence) | v unresolved contacts pass through Step 2: license_number (all sources, medium confidence) | merges into step-1 entities when license matches v Step 3: sf_business_license (all sources, medium confidence) | merges into existing entities when biz license matches v Step 4: fuzzy name match (remaining with name, low confidence) | blocking: first 3 chars of UPPER(name) | similarity: token-set Jaccard >= 0.75 v Step 5: singletons (remaining without match, low confidence) ``` Steps 2 and 3 check for existing entities created by prior steps. If a `license_number` already belongs to a step-1 entity, unresolved contacts with that license are merged into the existing entity rather than creating a duplicate. Each entity stores: `canonical_name`, `canonical_firm`, `entity_type`, identifier keys, `resolution_method`, `resolution_confidence`, `contact_count`, `permit_count`, `source_datasets`. ### Co-occurrence Graph (`src/graph.py`) Builds relationship edges between entities that appear on the same permit. **Edge computation** is a single SQL INSERT...SELECT: ```sql FROM contacts a JOIN contacts b ON a.permit_number = b.permit_number AND a.entity_id < b.entity_id -- canonical ordering, no dupes LEFT JOIN permits p ON a.permit_number = p.permit_number WHERE a.entity_id IS NOT NULL AND b.entity_id IS NOT NULL GROUP BY a.entity_id, b.entity_id ``` Edge attributes: - `shared_permits` — count of co-occurrences (edge weight) - `permit_numbers` — first 20 shared permit numbers - `permit_types` — distinct permit types - `date_range_start`, `date_range_end` — temporal span - `total_estimated_cost` — sum across shared permits - `neighborhoods` — distinct neighborhoods **Query operations:** - `get_neighbors(entity_id)` — 1-hop neighbors with edge attributes - `get_network(entity_id, hops)` — N-hop ego network via iterative frontier expansion ### Validation & Anomaly Detection (`src/validate.py`) Queries the graph and DuckDB tables for analysis: | Function | Purpose | |---|---| | `search_entity(name)` | LIKE search on canonical_name/firm, returns entity + top 5 co-occurring | | `entity_network(entity_id, hops)` | N-hop network traversal, returns nodes + edges | | `inspector_contractor_links(name)` | Inspector -> permits -> contacts -> entities trace | | `find_clusters(min_size, min_weight)` | Connected-component BFS on filtered subgraph | | `anomaly_scan(min_permits)` | 4 anomaly categories (see below) | | `run_ground_truth()` | Searches for known bad actors (Santos, Kong, Curran) | **Anomaly categories:** 1. **High permit volume** — entities with permit_count > 3x the median for their entity_type 2. **Inspector concentration** — entities where 50%+ of inspected permits use the same inspector 3. **Geographic concentration** — entities with 80%+ of permits in one neighborhood 4. **Fast approvals** — permits with filed-to-issued < 7 days and estimated_cost > $100K --- ## DuckDB Schema ``` contacts id INTEGER PK source TEXT -- 'building', 'electrical', 'plumbing' permit_number TEXT role TEXT -- normalized canonical type name TEXT -- normalized full name first_name TEXT last_name TEXT firm_name TEXT pts_agent_id TEXT license_number TEXT sf_business_license TEXT phone TEXT address TEXT city TEXT state TEXT zipcode TEXT is_applicant TEXT from_date TEXT entity_id INTEGER --> entities.entity_id data_as_of TEXT entities entity_id INTEGER PK canonical_name TEXT canonical_firm TEXT entity_type TEXT pts_agent_id TEXT license_number TEXT sf_business_license TEXT resolution_method TEXT -- 'pts_agent_id', 'license_number', 'sf_business_license', 'fuzzy_name', 'singleton' resolution_confidence TEXT -- 'high', 'medium', 'low' contact_count INTEGER permit_count INTEGER source_datasets TEXT -- comma-separated: 'building,electrical,plumbing' relationships entity_id_a INTEGER --> entities.entity_id entity_id_b INTEGER --> entities.entity_id shared_permits INTEGER permit_numbers TEXT permit_types TEXT date_range_start TEXT date_range_end TEXT total_estimated_cost DOUBLE neighborhoods TEXT PK (entity_id_a, entity_id_b) permits permit_number TEXT PK permit_type TEXT permit_type_definition TEXT status TEXT status_date TEXT description TEXT filed_date TEXT issued_date TEXT approved_date TEXT completed_date TEXT estimated_cost DOUBLE revised_cost DOUBLE existing_use TEXT proposed_use TEXT existing_units INTEGER proposed_units INTEGER street_number TEXT street_name TEXT street_suffix TEXT zipcode TEXT neighborhood TEXT supervisor_district TEXT block TEXT lot TEXT adu TEXT data_as_of TEXT inspections id INTEGER PK reference_number TEXT reference_number_type TEXT inspector TEXT scheduled_date TEXT result TEXT inspection_description TEXT block TEXT lot TEXT street_number TEXT street_name TEXT street_suffix TEXT neighborhood TEXT supervisor_district TEXT zipcode TEXT data_as_of TEXT ingest_log dataset_id TEXT PK dataset_name TEXT last_fetched TEXT records_fetched INTEGER last_record_count INTEGER ``` --- ## Key Modules ``` src/ ├── server.py # FastMCP entry point, registers all 20 tools ├── soda_client.py # Async SODA API client (httpx), used by Phase 1 tools + ingestion ├── formatters.py # Response formatting for Claude consumption ├── db.py # DuckDB + PostgreSQL dual-mode connections ├── knowledge.py # KnowledgeBase singleton, semantic index ├── ingest.py # SODA -> DuckDB/Postgres pipeline (paginated fetch, normalization) ├── entities.py # 5-step entity resolution cascade ├── graph.py # Co-occurrence graph (self-join + N-hop traversal) ├── validate.py # Validation queries + anomaly detection ├── report_links.py # External links for property reports ├── vision/ # AI Vision modules │ ├── client.py # Anthropic Vision API wrapper │ ├── pdf_to_images.py # PDF-to-base64 image conversion │ ├── prompts.py # EPR check prompts │ └── epr_checks.py # Vision-based EPR compliance checker └── tools/ ├── search_permits.py # Phase 1: SODA API ├── get_permit_details.py # Phase 1: SODA API ├── permit_stats.py # Phase 1: SODA API ├── search_businesses.py # Phase 1: SODA API ├── property_lookup.py # Phase 1: SODA API ├── search_complaints.py # Phase 1: SODA API ├── search_violations.py # Phase 1: SODA API ├── search_inspections.py # Phase 1: SODA API ├── search_entity.py # Phase 2: Entity search ├── entity_network.py # Phase 2: Network traversal ├── network_anomalies.py # Phase 2: Anomaly detection ├── knowledge_base.py # Phase 2.75: Shared loader for all tier1 JSON ├── predict_permits.py # Phase 2.75: Decision tree walk + predictions ├── estimate_timeline.py # Phase 2.75: Timeline percentiles ├── estimate_fees.py # Phase 2.75: Fee table calculation + statistics ├── required_documents.py # Phase 2.75: Document checklist assembly ├── revision_risk.py # Phase 2.75: Revision probability from cost data ├── recommend_consultants.py # Phase 3.5: Land use consultant recommendations ├── permit_lookup.py # Phase 3.5: Quick permit lookup (exact match, historical lots, parcel merge) ├── analyze_plans.py # Phase 4: AI vision plan analysis └── validate_plans.py # Phase 4: EPR compliance checking ``` ### Knowledge Base Dependencies (Phase 2.75) | Tool | Knowledge Files | DuckDB | |------|----------------|--------| | `predict_permits` | decision-tree-draft.json, semantic-index.json, otc-criteria.json, G-20-routing.json, ada-accessibility-requirements.json, title24-energy-compliance.json, dph-food-facility-requirements.json | No | | `estimate_timeline` | — | Yes (timeline_stats materialized view) | | `estimate_fees` | fee-tables.json, ada-accessibility-requirements.json | Yes (statistical comparison) | | `required_documents` | completeness-checklist.json, epr-requirements.json, title24-energy-compliance.json, dph-food-facility-requirements.json | No | | `revision_risk` | title24-energy-compliance.json, ada-accessibility-requirements.json, dph-food-facility-requirements.json | Yes (revised_cost analysis) | All 15 knowledge files loaded once via `KnowledgeBase` singleton (`@lru_cache`). --- ## Configuration | Variable | Purpose | Default | |---|---|---| | `SODA_APP_TOKEN` | SODA API app token for higher rate limits | None (anonymous) | | `SF_PERMITS_DB` | Path to DuckDB database file | `data/sf_permits.duckdb` | | `DATABASE_URL` | PostgreSQL connection string (production) | None (uses DuckDB) | | `ADMIN_EMAIL` | Auto-seed admin user on empty DB | None | | `CRON_SECRET` | Bearer token for cron endpoints | Required in prod | | `ANTHROPIC_API_KEY` | Claude Vision API for plan analysis | Required for Vision tools | | `OPENAI_API_KEY` | OpenAI embeddings for RAG | Required for RAG | The DuckDB file lives in `data/` (gitignored). Regenerate by running the full pipeline: ```bash python -m src.ingest # Fetch data from SODA API python -m src.entities # Resolve entities python -m src.graph # Build co-occurrence graph python -m src.validate all # Run validation + anomaly scan ``` --- ## PostgreSQL Production Database In production (Railway), the app uses PostgreSQL with pgvector instead of DuckDB. `src/db.py` detects `DATABASE_URL` and switches backends automatically. **pgvector-db tables:** | Category | Tables | |----------|--------| | User data | `users`, `auth_tokens`, `watch_items`, `feedback`, `activity_log`, `points_ledger` | | Permit data | `contacts`, `entities`, `relationships`, `permits`, `inspections`, `timeline_stats`, `ingest_log`, `permit_changes` | | Vision | `plan_analysis_sessions`, `plan_analysis_images`, `plan_analysis_jobs` | | RAG | `knowledge_chunks` (pgvector 1536-dim embeddings) | | System | `cron_log`, `regulatory_watch` | The database is internal-only (`pgvector-db.railway.internal:5432`). Use the `/health` endpoint to check table state remotely. --- ## Flask Web UI The web application (`web/app.py`) provides a browser-based interface deployed on Railway: - **Authentication**: Magic-link passwordless login (`web/auth.py`) - **Morning briefs**: Daily permit activity summaries (`web/brief.py`, `web/email_brief.py`) - **Triage reports**: Nightly email digests (`web/email_triage.py`) - **Plan analysis**: Async AI vision analysis of uploaded PDFs (`/analyze`) - **Regulatory watch**: Admin CRUD for regulation monitoring (`web/regulatory_watch.py`) - **Feedback system**: User feedback with bounty points (`web/activity.py`) - **Admin auto-seed**: Creates admin user from `ADMIN_EMAIL` on empty database Tech stack: Flask, HTMX, Jinja2 templates, deployed via Railway GitHub auto-deploy. --- ## AI Vision Infrastructure The Vision subsystem (`src/vision/`) analyzes architectural drawings using the Anthropic Claude Vision API: ``` PDF upload → pdf_to_images.py (base64 conversion) → client.py (Claude Vision API) → epr_checks.py (EPR compliance prompts) → results stored in plan_analysis_* tables ``` - **Async processing**: Full analysis runs as background jobs to avoid HTTP timeouts - **EPR checks**: Energy, plumbing, structural compliance verification against code requirements - **Annotations**: 10-color system with localStorage persistence, ARIA accessibility --- ## Backup Infrastructure Three-layer strategy (see `docs/BACKUPS.md` for full details): 1. **Railway native backups** — Daily + Weekly via dashboard (pgvector-db → Settings → Backups) 2. **pg_dump cron** — `POST /cron/backup` endpoint with CRON_SECRET auth 3. **Admin auto-seed** — Empty `users` table + `ADMIN_EMAIL` env var → admin account on startup User-generated data (users, watches, feedback) requires backups. Permit/entity/knowledge data is regenerable from SODA API and git.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tbrennem-source/sf-permits-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ARCHITECTURE.md•18.1 KiB