Open Census MCP Server

CLAUDE.md•13.2 KiB

# CLAUDE.md — Census MCP Server v3 ## Project Overview AI-powered statistical consultant for U.S. Census data via Model Context Protocol (MCP). Pure Python. No R dependency. Pragmatics-first architecture. **Core insight:** Census data has a pragmatics problem, not a search problem. Knowing WHICH data to use and HOW to interpret it matters more than finding it. ## Current State **Current Phase:** 4B — Systematic Evaluation (V2 Redo) - V1 results archived (confounded tool access — see ADR-011) - Stage 1 V2 (response generation): ✅ Complete (39 queries × 3 conditions) - Stage 2 V2 (judge scoring): ✅ Complete — all 3 pairwise comparisons, 2106 records - Aggregate analysis: ✅ Complete — Prag d=1.440 vs Ctrl, d=0.922 vs RAG (certified) - Stratum analysis: ✅ Complete — no overfit, d=2.347 normal, d=1.135 edge (SA-001–022) - Cost analysis: ✅ Complete — pragmatics 2.2× more cost-effective than RAG (COST-001–013) - Stage 3 (fidelity verification): ✅ Complete (Prag 91.2%, Control 78.3%, RAG 74.6%) - Stage 4 (expert validation): ⏳ Pending - Paper outline: ✅ Up to date — `paper/outline.md`, numbers in `paper/numbers_registry.md` - Lab notebook: talks/fcsm_2026/ (dated entries with run details, QC, decisions) v1/v2 archived to `/Users/brock/Documents/GitHub/archive-opencensusmcp/v2`. ## FCSM Talk Lab Notebook `talks/fcsm_2026/notes.md` is a **chronological lab notebook**. Add dated entries with lessons learned, insights, and observations. Never edit old entries — append corrections as new entries. Reference files in the same directory store polished context (e.g., `reference_*.md`). ## FCSM 2026 Deliverables **Master checklist:** `talks/fcsm_2026/fcsm_master_checklist.md` **Talk script:** `talks/fcsm_2026/pragmatics_talking_script_v3.md` **TEVV crosswalk (canonical, publication-ready):** `/Users/brock/Documents/GitHub/central_library/crosswalks/fcsm_nist/FCSM_NIST_Crosswalk_Article.md` **TEVV crosswalk (earlier drafts, superseded):** `reports/tevv/pure_crosswalk_part1.md` + `part2.md` **TEVV methodology:** `reports/tevv/TEVV_methodology_document.md` **CQS rubric:** `docs/verification/cqs_rubric_specification.md` **Deadline:** ~March 5, 2026 (slide deck). See checklist for daily targets. ## Repo Structure Canonical structure is defined in `docs/requirements/srs.md` section 2 (that is law). Quick reference: ``` docs/requirements/ # ConOps, SRS docs/design/ # Pragmatics vocabulary, reference card, extraction pipeline spec docs/architecture/ # System architecture docs/decisions/ # ADRs docs/verification/ # Evaluation results docs/lessons_learned/ # Project narrative from v1/v2 src/census_mcp/ # Runtime package (api/, geography/, pragmatics/, tools/) staging/ # Pack content source of truth (JSON, version controlled) packs/ # Compiled SQLite packs (build artifacts, gitignored) knowledge-base/ # Source material (source-docs/ gitignored) scripts/ # Build, compile, extraction scripts tests/ # unit/, integration/, evaluation/ src/eval/ # CQS evaluation pipeline (harness, judges, fidelity) results/ # Evaluation outputs (gitignored) docs/test/ # Human evaluator scoring materials talks/fcsm_2026/ # FCSM conference talk materials handoffs/ # Thread handoff docs (gitignored) cc_tasks/ # Claude Code task files (gitignored) tmp/ # Scratch space (gitignored) ``` ## Key Conventions - **Never edit files without explicit permission.** Output to artifacts or chat. - **TEVV every task.** Test-Evaluate-Verify-Validate before moving on. (NIST AI RMF 2023) - **Prompt = how to think, Packs = what to know.** Never duplicate domain knowledge in both. - **Adding knowledge?** See `docs/design/pragmatics_authoring_guide.md` - **CC tasks go in `cc_tasks/`** with date prefix: `YYYY-MM-DD_description.md` - **Thread handoffs go in `handoffs/`** with date prefix - **Scratch work goes in `tmp/`** - All three directories are gitignored. ## Pragmatics Content Quality Rules **What is a pragmatic?** A context item encoding expert statistical judgment about fitness-for-use — what a senior statistician would tell a colleague before they use data. Pragmatics are NOT rules, constraints, lookup tables, or LLM instructions. They are structured expert knowledge with latitude (Morris 1938 semiotics). **Canonical schema:** `src/census_mcp/pragmatics/models.py` (Pydantic). All content MUST conform. Key fields: `context_id`, `domain`, `category`, `latitude`, `context_text`, `triggers` (NOT `tags`), `thread_edges`, `provenance` (required: sources list with document/section/page, confidence level, optional synthesis_note and limitations). **Content principles (summary) — full details in `docs/design/pragmatics_authoring_guide.md`:** - **Principles, not instances** — encode the judgment, not the data. The LLM knows FIPS codes; it doesn't know when the nesting assumption breaks. - **No lookup tables, no LLM instructions** — factual context only ("MOE exceeding estimate indicates unreliability"), not directives ("always warn the user"). - **1-3 sentences, 3-6 triggers, latitude justified** — `none`=hard constraint, `narrow`=strong guidance, `wide`=context-dependent, `full`=background FYI. - **Provenance from documentation only** — read source first, cite page/section. Never reverse-engineer citations or use LLM training data as source. - **Thread edges for retrieval depth** — connect items a user might need together; don't over-connect. **Staging directory:** `staging/{domain}/{category}.json` — one file per category. Manifest in `staging/{domain}/manifest.json`. Compile with `python scripts/compile_all.py`. ## Eval Pipeline Commands All scripts runnable as modules from repo root: ```bash python -m src.eval.aggregate_analysis # Stage 2 CQS aggregate + effect sizes python -m src.eval.stratum_analysis # Normal vs edge stratum breakdown python -m src.eval.overhead_analysis # Token/resource overhead per condition python -m src.eval.cost_analysis # Dollar cost per query per condition python -m src.eval.fidelity_aggregate # Stage 3 fidelity scores python -m src.eval.fidelity_qc # Stage 3 QC checks (VR-097–100) python -m src.eval.verify_registry_counts # Verify numbers_registry.md claims python scripts/compile_all.py # Compile staging/ → packs/ SQLite ``` ## Implementation Schedule **See:** `docs/architecture/implementation_schedule.md` for detailed task breakdown. **Current Phase:** 4B — Systematic Evaluation ## Vocabulary All terms defined in `docs/design/pragmatics_vocabulary.md` (normative). Key terms: - **Pragmatics** — fitness-for-use expert judgment layer (Morris 1938) - **Pack** — domain-specific shippable bundle (compiles to SQLite) - **Thread** — connected graph path through context nodes - **Context** — expert knowledge content (not rules, not constraints) - **Latitude** — freedom to bend: none / narrow / wide / full - **NEVER use:** crystal, constraint, rule, guardrail, directive, ontology, weight, severity ## Neo4j Pragmatics Database (Authoring Environment) - **Database name:** `pragmatics` — prefix ALL Cypher queries with `USE pragmatics` - **Contains:** Context nodes (36 ACS), Pack nodes (1), thread edges (14 RELATES_TO, 17 BELONGS_TO) - **This is the authoring/research environment per ADR-001** - **Pipeline:** Neo4j → export script → staging JSON → compile_pack.py → SQLite packs - **Arnold/training graph is in the default database — DO NOT mix them** - **Round-trip scripts:** `scripts/neo4j_to_staging.py` (export) and `scripts/staging_to_neo4j.py` (import). Require NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD env vars. - **LLM extraction scripts:** `scripts/extract/` is empty — future home of PDF chunking + LLM extraction (MinerU, agent swarms). Not yet implemented. - **Schema:** All staging files use canonical Pydantic format (triggers, thread_edges, structured source). Old flat format purged 2026-02-08. ## Neo4j MCP Configuration (Claude Desktop) - **neo4j-pragmatics** — points to `pragmatics` database (authoring environment for Context/Pack nodes) - **neo4j-quarry** — points to `quarry` database (raw KG extraction target) - Both accessible directly from Claude Desktop MCP tools. No Python scripts needed to query either database. - Previous single-database limitation resolved by running two separate MCP server instances. ## Neo4j Raw Knowledge Graph (Quarry) - **Database name:** `quarry` (separate from `pragmatics` database) - **Schema:** `docs/design/raw_kg_schema.md` v3.1 — 4-layer harvest architecture - **Architecture:** Extract facts (Layer 1) → pattern-match against standards (Layer 2) → curate (Layer 3) → export to pragmatics DB - **Key insight:** Fitness implications are DERIVED by Cypher queries, not extracted from documents - **Tool:** Custom extraction pipeline in `scripts/quarry/` (ADR-008, ADR-009). Replaces llm-graph-builder. - **llm-graph-builder:** Installed at `~/Documents/GitHub/llm-graph-builder` for reference only. See ADR-008 for rationale. - **Large quarry operations:** Use Claude Code to conserve context window in Claude Desktop. - **NEXT:** Build `scripts/quarry/` toolkit (Phase 5B). Section-aware chunking, direct structured extraction, entity resolution. ## Key Architecture Docs for Pragmatics - `docs/decisions/ADR-001-neo4j-authoring-sqlite-runtime.md` — Authoring vs runtime separation - `docs/architecture/knowledge_pack_management.md` — Full pipeline architecture - `docs/design/extraction_pipeline.md` — Source docs → LLM extraction → staging - `docs/design/pragmatics_authoring_guide.md` — How to add content - `docs/design/pragmatics_vocabulary.md` — Normative terminology - `docs/design/pragmatics_data_flow.md` — End-to-end data flow explainer - `docs/design/theoretical_foundations.md` — ReAct, OODA, Cynefin, Morris semiotic triad - `src/census_mcp/pragmatics/models.py` — Pydantic models (canonical schema) - `docs/design/raw_kg_schema.md` — Raw KG schema v3.1 (13 node types, 16 relationships, 4-layer harvest architecture) - `docs/design/kg_schema_design_narrative.md` — Design process narrative (multi-model adversarial review) - `docs/design/reviews/README.md` — External review audit trail - `docs/decisions/ADR-007-kg-first-authoring.md` — KG-first authoring workflow - `docs/decisions/ADR-008-custom-extraction-pipeline.md` — Why llm-graph-builder was replaced - `docs/decisions/ADR-009-quarry-toolkit-shippable.md` — Quarry toolkit ships as project component - `docs/design/quarry_extraction_pipeline.md` — Quarry pipeline design (Docling + direct LLM extraction) ## Technical Context - **Census API:** Direct Python HTTP calls to `api.census.gov` - **Pragmatic context:** Authored in Neo4j (`USE pragmatics`), exported to JSON in `staging/`, compiled to SQLite packs in `packs/` - **Evaluation:** Four-stage CQS pipeline: (1) response generation, (2) multi-vendor pairwise judge scoring on D1-D5 with 6-pass counterbalancing, (3) automated fidelity verification, (4) expert validation. Config: src/eval/judge_config.yaml - **Eval config:** `src/eval/judge_config.yaml` (all parameters, SRS C-006) - **No vector DB, no RAG over metadata** — structured context with latitude, not embeddings - **No ontology layer** — the LLM's weights are the semantic layer; we supply pragmatics only ## Key Lessons from v1/v2 - **Geography resolver is critical** — FIPS resolution was the one thing that actually worked and mattered. Prioritize. - **RAG over variable metadata fails** — Census domain is too semantically homogeneous for embeddings to differentiate. Semantic smearing. - **Don't rebuild the semantic layer** — COOS/enrichment/ontology work was duplicating what the LLM already knows. - **Batch API calls are essential** — real analysis needs multi-variable, multi-geography retrieval, not single lookups. - **The MCP is a component** — design tools as composable, stateless units for agentic workflows. ## Archive Reference `/Users/brock/Documents/GitHub/archive-opencensusmcp/v2` — Previous implementation (v1/v2). Contains embedding indexes, 1GB+ enriched variable metadata, ontology attempts, and 47+ diagnostic scripts. Useful as archaeology, not as code. Key lesson: RAG over semantically homogeneous Census metadata causes semantic smearing — this is why we moved to pragmatics. ## What NOT to Do - Don't add R, tidycensus, or Docker infrastructure (that's v1/v2) - Don't build ontology, COOS, or semantic enrichment layers — the LLM handles semantics - Don't use RAG over variable metadata — semantic smearing kills it - Don't create files outside the repo without asking - Don't use web search for Census data — use Census API or project knowledge base - Tool is `get_census_data` not `get_acs_data` (legacy name accepted but deprecated) - Don't use the term "crystal" anywhere — it's purged - Don't use "hallucination" — the correct term is **confabulation** (pattern-completion from training distribution, not perception of nonexistent stimuli) - Don't build throwaway MVPs — build the real thing correctly from the start - Don't add external databases (Neo4j, Postgres, etc.) — SQLite only per SRS C-002 - Don't add dependencies without justification — minimal footprint, prove we need it first

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brockwebb/open-census-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CLAUDE.md•13.2 KiB