Open Census MCP Server

implementation_schedule.md•23.1 KiB

# Census MCP Server — Implementation Schedule *Created: 2026-02-08* *Last Updated: 2026-02-08* --- ## Phase 0: Foundation ### Track A: Census API Client ✅ COMPLETE **Location:** `src/census_mcp/api/census_client.py` | Task | Description | Status | |------|-------------|--------| | A.1 | HTTP client skeleton (httpx) | ✅ | | A.2 | ACS 5-year endpoint implementation | ✅ | | A.3 | ACS 1-year endpoint implementation | ✅ | | A.4 | API key handling | ✅ | | A.5 | Rate limiting / retry logic | ✅ | | A.6 | Error response parsing | ✅ | | A.7 | Unit tests (13 passing) | ✅ | | A.8 | Integration test (live API) | ✅ | **TEVV:** Complete. 14/14 tests passing. ### ~~Track B: Geography Resolver~~ — DELETED **Decision:** LLM handles geographic resolution. Edge cases (Virginia independent cities, NYC boroughs) go into pragmatics packs as context items. No separate resolver code. **Rationale:** 90/10 rule. LLM gets 95%+ of geography right. Census API validates FIPS — wrong codes error. Edge cases are domain expertise = pragmatics, not code. --- ## Phase 1: Pack Pipeline ### Track C: Pack Schema & Compiler ✅ COMPLETE **Location:** `src/census_mcp/pragmatics/`, `scripts/` | Task | Description | Status | |------|-------------|--------| | C.1 | SQLite schema DDL (from vocabulary doc) | ✅ | | C.2 | JSON staging format validation (Pydantic) | ✅ | | C.3 | compile_pack.py: JSON → SQLite | ✅ | | C.4 | Pack inheritance resolution | ✅ | | C.5 | compile_all.py: batch compilation | ✅ | | C.6 | pack.py: load .db at runtime | ✅ | | C.7 | Integration test: round-trip JSON→DB→query | ✅ | **TEVV:** Complete. 15/15 tests passing. **Deliverables:** - `scripts/compile_pack.py` - Single pack compiler - `scripts/compile_all.py` - Batch compiler - `src/census_mcp/pragmatics/schema.py` - SQLite DDL - `src/census_mcp/pragmatics/models.py` - Pydantic validation - `src/census_mcp/pragmatics/pack.py` - Runtime PackLoader ### Track D: Seed Content ✅ COMPLETE (Initial) **Location:** `staging/`, `packs/` | Task | Description | Status | |------|-------------|--------| | D.1 | General statistics rules | ✅ (3 items) | | D.2 | Census-wide rules | ✅ (3 items) | | D.3 | ACS-specific rules | ✅ (17 items) | | D.4 | Thread edges between related rules | ✅ (6 threads) | | D.5 | Validate against schema | ✅ | | D.6 | ACS General Handbook deep extraction (7 non-obvious findings) — author with provenance | ✅ | ACS-GEN-001 | **ACS Pack Details (35 contexts):** - Population thresholds: 5 (65K rule, 20K supplemental, 5-year coverage, tract/BG controls, independent cities) - MOE/reliability: 4 (SE formula, CV threshold, precision comparison, covariance bias) - Comparison rules: 4 (no 1yr/5yr mixing, overlapping periods, significance testing, 4/5 smoothing, CI anti-pattern) - Period estimates: 1 (labeling guidance) - Dollar values: 1 (inflation adjustment, CPI-U-RS limitation) - Geography: 5 (block groups, PUMAs, congressional districts, boundary dates, independent cities) - Breaks/discontinuities: 3 (2009-2010 population controls) - Suppression: 1 (data availability) - Disclosure: 3 (swapping, perturbation, thresholds) - Equivalence: 2 (geographic, temporal) - Group quarters: 2 (inclusion, imputation) - Nonresponse: 2 (allocation rates, hot-deck) - Residence rules: 1 (current vs usual) - Sampling: 1 (non-uniform rates) - Independent cities: 1 (county-equivalent) **Latitude Distribution:** - `none`: 5 (hard constraints) - `narrow`: 7 (strong guidance) - `wide`: 4 (context-dependent) - `full`: 1 (background info) **Source:** ACS-GEN-001 (Understanding and Using ACS Data handbook, 2020) **D.6 Complete:** 7 non-obvious findings from ACS handbook authored with page-level provenance (2026-02-11). New categories: population_controls, residence_rules, sampling. Strengthened: comparison (CMP-002, CMP-003), dollar_values (DOL-001), margin_of_error (MOE-004). --- ## Phase 2: Pragmatics Retriever ✅ COMPLETE (Revised) Depends on: Phase 1 (pack loading) ✅ **Architecture Revision per ADR-003/004:** LLM caller handles routing and interpretation. MCP provides structured data retrieval only. | Task | Location | Description | Status | |------|----------|-------------|--------| | ~~E.1~~ | ~~`router.py`~~ | ~~Query classification~~ | ❌ Deleted (LLM does routing) | | ~~E.2~~ | ~~`router.py`~~ | ~~Trigger extraction~~ | ❌ Deleted (LLM does extraction) | | E.3 | `retriever.py` | Context lookup by topics (tag match) | ✅ | | E.4 | `retriever.py` | Thread traversal for related contexts | ✅ | | E.5 | `retriever.py` | Parameter-based trigger mapping | ✅ | | ~~E.6~~ | ~~`compiler.py`~~ | ~~Natural language formatting~~ | ❌ Deleted (LLM does formatting) | | ~~E.7~~ | ~~`compiler.py`~~ | ~~Citation formatting~~ | ❌ Deleted (return raw citations) | | E.8 | Unit tests | Retriever logic tested | ✅ (9/9) | **Deliverables:** - `src/census_mcp/pragmatics/retriever.py` - PragmaticsRetriever with two methods: - `get_guidance_by_topics(topics, domain)` - Tag-based lookup - `get_guidance_by_parameters(product, geo_level, variables, year)` - Auto-bundling for data responses - Unit tests: 9/9 passing **TEVV:** Complete. Returns structured guidance dict with `{guidance, related, sources}` fields. --- ## Phase 3: MCP Server & Tools ✅ COMPLETE Depends on: Phase 0A (API client) ✅, Phase 2 (retriever) ✅ **Primary Artifact:** `docs/design/agent_prompt.md` defines agent behavior, tool schemas, and workflow. | Task | Location | Description | Status | |------|----------|-------------|--------| | F.1 | `server.py` | get_methodology_guidance tool | ✅ | | F.2 | `server.py` | get_census_data handler (data + pragmatics) | ✅ (renamed from get_acs_data, G.6) | | F.3 | `server.py` | explore_variables handler | ✅ | | F.4 | `server.py` | Low-level Server + stdio (ADR-005) | ✅ (rewrote from FastMCP) | | F.5 | `agent_prompt.md` | Agent prompt (slimmed G.6) | ✅ | | F.6 | `server.py` | Pack loading (lazy initialization) | ✅ | | F.7 | Integration tests | 10 tests: tools + tract fixes + legacy compat | ✅ (10/10) | | F.8 | `pyproject.toml` | MCP dependency, entry point | ✅ | **Deliverables:** - `src/census_mcp/server.py` - FastMCP server with lifespan management - `src/census_mcp/tools/census_tools.py` - Three tool handlers implementing agent_prompt.md schemas - Integration tests: 6/6 passing - Entry point: `census-mcp` CLI command **TEVV:** Complete. Server starts, loads packs, tools respond with proper structure. Ready for manual Claude Desktop integration test. **Implementation Notes:** - Used FastMCP (mcp.server.fastmcp) instead of deprecated MCPServer pattern - Tools access ServerContext via get_server_context() (lazy init) - Hard stops implemented (e.g., tract + acs1 raises CensusInvalidQueryError) - Pragmatics auto-bundled with every get_acs_data response --- ## Phase 4A: Manual Validation ✅ COMPLETE Depends on: Phase 3 ✅ *Objective: Prove the system works end-to-end before investing in automated evaluation.* | Task | Description | Status | Traces To | |------|-------------|--------|----------| | G.1 | Fix PACKS_DIR: `server.py` reads from `os.environ.get("PACKS_DIR", "packs")` | ✅ | Blocks all testing | | G.2 | Restart Claude Desktop, verify MCP connection healthy | ✅ | VR-002 | | G.3 | Test: "What's the median income in Mercer, PA?" (MCP tools live) | ✅ | VR-012 | | G.4 | Test: Owsley County, KY poverty — three-model comparison (Sonnet 4, 4.5, Opus 4.6) | ✅ | VR-012 | | G.5 | SRS reconciliation: update FR-PC-001, FR-PC-003 to align with ADR-003/004 | ✅ | ADR-003, ADR-004 | | G.6 | Prompt slimming: removed domain rules, renamed tool, FSS-general language | ✅ | Decision log (prompt specificity) | | G.7 | Add independent cities pack content (ACS-IND-001/002/003) | ✅ | VR-011 | | G.8 | Document results of manual tests in `docs/verification/` | ✅ (partial) | — | | G.9 | **BUG FIX:** `get_census_data` tract parameter — add wildcard support + county validation (ADR-006) | ✅ | G.4 finding | | G.10 | **PACK:** Disclosure avoidance (ACS-DIS-001/002/003) | ✅ | G.4 finding | | G.11 | **PACK:** Population thresholds (ACS-THR-001/002) + geographic equivalence (ACS-EQV-001/002) | ✅ | G.4 finding | **Exit Criteria:** ✅ Met. System produces pragmatics-grounded responses for test queries. Tract-level geography works. SRS reflects actual architecture. Agent prompt slim (no domain overfitting). --- ## Phase 4B: Systematic Evaluation ⏳ IN PROGRESS Depends on: Phase 4A ✅ *Objective: Empirical evaluation for FCSM talk. Does pragmatics improve statistical consultation quality?* ### Experimental Design - **Treatment:** Claude + MCP (live tools, live pragmatics, full agent loop) - **Control:** Claude alone (same query, no tools, no pragmatics) - **Scoring:** CQS rubric applied to paired responses by LLM judge panel + human expert calibration - **Judge panel:** Gemini, OpenAI, Claude (3-model panel for inter-rater reliability, bias mitigation) - **Human calibration:** Expert-scored subset (10-15 queries) to anchor automated scoring validity ### Build Order **Step 1: CQS Rubric Definition** | Task | Description | Status | Traces To | |------|-------------|--------|----------| | H.1 | Define CQS scoring dimensions and scale | ✅ | VR-006 | | H.2 | Draft scoring prompt template for LLM judge panel | ✅ | VR-006 | | H.3 | Validate rubric against manually scored examples | ✅ | Manual calibration packet | **Step 2: Test Query Battery** | Task | Description | Status | Traces To | |------|-------------|--------|----------| | H.4 | Data-driven test definitions (YAML/JSON, no code changes to add queries) | ✅ | VR-007 | | H.5 | Test battery: 41% normal / 59% edge case weighting (power-based) | ✅ | DEC-4B-009 | | H.6 | Geographic edge cases (independent cities, NYC boroughs, DC, consolidated city-counties) | ✅ | GEO-001 through GEO-006 | | H.7 | Small-area reliability cases (<65K, <20K, tract-level) | ✅ | SML-001 through SML-004 | | H.8 | Temporal edge cases (cross-vintage, overlapping periods, breaks, inflation) | ✅ | TMP-001 through TMP-004 | | H.9 | Ambiguity cases (Portland, Springfield, Washington) | ✅ | AMB-001 through AMB-003 | | H.10 | Product-mismatch cases (1-year for small geo, decennial→ACS) | ✅ | MIS-001 through MIS-003 | | H.11 | Persona-based query variants (8th grader, city planner, journalist) | ✅ | PER-001a through PER-001c | **Step 3: Test Harness** | Task | Description | Status | Traces To | |------|-------------|--------|----------| | H.12 | MCP client: programmatic subprocess launch + stdio JSON-RPC connection | ✅ | VR-001 | | H.13 | Health check: verify MCP connection before test run | ✅ | VR-002 | | H.14 | Agent loop: Claude API tool_use → MCP tool execution → tool_result → final response | ✅ | VR-001 | | H.15 | Control path: Claude API same query, no tools, no system prompt augmentation | ✅ | VR-001 | | H.16 | Structured result recording (query, condition, model, response, tool calls, pragmatics returned, latency) | ✅ | VR-005 | | H.17 | Output format: JSON lines for scoring pipeline | ✅ | VR-006 | **Step 4: Judge Prompt & Scoring Pipeline** | Task | Description | Status | Traces To | |------|-------------|--------|----------| | H.18 | Scoring prompt: domain-specific rubric for LLM judges (Gemini, OpenAI, Claude) | ✅ | VR-006 | | H.19 | Judge harness: send paired responses to 3 models, collect dimension scores | ✅ | VR-003, VR-004 | | H.20 | Inter-rater agreement calculation (Krippendorff's α or Fleiss' κ) | ⏳ | Aggregate analysis pending | | H.21 | Human calibration set: expert-scored subset | ⏳ | Manual scoring packet created | **Step 5: Run & Analyze** | Task | Description | Status | Traces To | |------|-------------|--------|----------| | H.22 | Execute full battery: control + treatment for all queries | ✅ | Stage 1 v3 complete | | H.23 | Execute judge panel scoring on all paired results | ⏳ | OpenAI ✅, Anthropic + Google pending | | H.24 | Aggregate results: CQS by dimension, treatment effect, agreement stats | ⏳ | Pending all vendors | | H.25 | Bug fixes and pack content expansion from evaluation failures | ✅ | Truncation + temporal + dataset mapping | | H.26 | Results documentation in `docs/verification/` | ⏳ | — | **Step 6: Pipeline Fidelity (Stage 3)** | Task | Description | Status | Traces To | |------|-------------|--------|----------| | H.27 | Design automated fidelity metric (DEC-4B-023) | ✅ | D6 revision | | H.28 | Build fidelity extraction pipeline (`src/eval/fidelity_check.py`) | ✅ | VR pipeline | | H.29 | Run fidelity verification on all Stage 1 data | ✅ | H.22 | | H.30 | Symmetric auditability measurement (both conditions) | ✅ | Experimental design | | H.31 | Integrate fidelity results into aggregate analysis | ⏳ | H.24 | **Exit Criteria:** Documented CQS scores for Claude with vs without pragmatics. LLM judge panel + human calibration. Results in `docs/verification/`. **Deliverable:** Empirical evaluation suitable for FCSM talk. --- ## Dependency Graph ``` Phase 0A (API Client) ✅ ────────────────────────┐ ├─► Phase 3 (MCP) ✅ ─► Phase 4A (Manual) ─► Phase 4B (Eval) Phase 1C (Pack Pipeline) ✅ ──► Phase 2 (Retriever) ✅ ┘ │ Phase 1D (Seed Content) ✅ ────┘ ``` --- ## Current Status | Phase | Status | Tests | |-------|--------|-------| | 0A: API Client | ✅ Complete | 14/14 | | ~~0B: Geography~~ | ❌ Deleted | — | | 1C: Pack Pipeline | ✅ Complete | 15/15 | | 1D: Seed Content | ✅ Complete | — | | 2: Retriever | ✅ Complete | 9/9 | | 3: MCP Server | ✅ Complete | 10/10 | | 4A: Manual Validation | ✅ Complete | — | | 4B: Systematic Evaluation | ⏳ Stage 2-3 in progress | — | **Total Tests:** 47/47 (all passing) **Last Updated:** 2026-02-13 --- ## Infrastructure & CI | Component | Status | |-----------|--------| | GitHub Actions CI | ✅ `.github/workflows/ci.yml` | | Unit tests | ✅ pytest | | Pack compilation | ✅ In CI pipeline | | Ruff linting | ✅ Separate job | --- ## Documentation Added | Document | Purpose | |----------|---------| | `docs/references/CATALOG.md` | Source document registry with provenance | | `docs/references/theory/semiotic_dq_foundations.md` | Theoretical foundation citations | | `docs/architecture/knowledge_pack_management.md` | Authoring vs runtime separation | | `docs/design/pragmatics_vocabulary.md` | Canonical terms + theoretical foundation | --- ## Architecture Decision Records | ADR | Title | Status | |-----|-------|--------| | ADR-003 | Reasoning Model Requirement | Accepted | | ADR-004 | Agent Reasoning Loop (ReAct + OODA + Cynefin) | Accepted | | ADR-005 | Low-level Server Pattern (FastMCP bypass) | Accepted | | ADR-006 | Tract-Level Geography Bug Fixes | Accepted | | ADR-007 | KG-First Authoring Workflow | Accepted | | ADR-008 | Custom Extraction Pipeline over llm-graph-builder | Accepted | | ADR-009 | Quarry Toolkit as Shippable Project Component | Accepted | | — | Prompt Specificity Concern | ✅ Resolved (G.6) | --- ## Open Items (Post Phase 3) | Item | Priority | Notes | |------|----------|-------| | `server.py` reads PACKS_DIR from env | **Immediate** | Hardcoded `"packs"` relative path fails when Claude Desktop launches from arbitrary CWD. Must read `os.environ.get("PACKS_DIR", "packs")`. Blocks manual testing. | | Claude Desktop integration test | **Immediate** | Config updated (2026-02-08). Restart Desktop, test "What's the median income in Mercer, PA?" | | SRS reconciliation with ADR-003/004 | High | FR-PC-001 (system classifies queries) and FR-PC-003 (inject into tool descriptions) contradict ADRs. Update FRs. | | ~~Slim agent prompt "Never" list~~ | ~~Medium~~ | ✅ Done (G.6). 280→55 lines. Tool renamed `get_census_data`. | | API testbench (multi-model CQS) | Medium | CLI harness to run test queries against Claude/GPT/Gemini via API. Validates pragmatics work regardless of reasoning model (ADR-003). | --- ## Phase 4A.5: Pipeline Repair & Schema Migration ⏳ IN PROGRESS Discovered 2026-02-08: Round-trip scripts were never built. Neo4j nodes use stale schema. See `docs/lessons_learned/session_2026-02-08_pipeline_gap.md` for root cause. | Task | Description | Status | Traces To | |------|-------------|--------|----------| | P.1 | Create `scripts/neo4j_to_staging.py` (export) | ✅ | FR-EP-001 | | P.2 | Create `scripts/staging_to_neo4j.py` (import) | ✅ | FR-EP-002 | | P.3 | Add FR-EP-001–009 to SRS § 3.5 | ✅ | SRS | | P.4 | Document pipeline gap in lessons learned | ✅ | — | | P.5 | Update CLAUDE.md with Neo4j details + script refs | ✅ | — | | P.6 | Migrate Neo4j nodes in-place: `tags`→`triggers`, add `category`, restructure `source` | ✅ | FR-EP-004 | | P.7 | Run `neo4j_to_staging.py` to generate canonical staging JSON | ✅ (manual) | FR-EP-001, FR-EP-003 | | P.8 | Remove old `staging/acs.json` (replaced by `staging/acs/*.json` per-category files) | ✅ | FR-EP-003 | | P.9 | Run `compile_all.py` to rebuild packs from new staging | ✅ | — | | P.10 | Validate: run tests, confirm pack round-trip | ✅ | — | | P.11 | Author G.10/G.11/G.7 content in Neo4j using canonical schema (10 new nodes, 10 new edges) | ✅ | G.10, G.11, G.7 | | P.12 | Run full pipeline: Neo4j → staging → compile → test | ✅ | End-to-end | | P.13 | Provenance schema migration: models.py, staging JSON, Neo4j scripts, CLAUDE.md, tests | ✅ | ADR (provenance) | **Exit Criteria:** Full round-trip works. All staging JSON matches Pydantic model. No stale formats. --- ## Phase 5: Knowledge Graph Extraction Pipeline ⏳ DESIGN COMPLETE **Schema:** `docs/design/raw_kg_schema.md` v3.1 — reviewed by 4 AI models across 5 rounds. **Architecture:** 4-layer harvest (seed → extract → harvest → curate → export) **ADR:** ADR-007 (KG-first authoring) | Task | Description | Status | |------|-------------|--------| | KG.1 | Raw KG schema design (13 node types, 16 relationships) | ✅ v3.1 | | KG.2 | Multi-model adversarial review (5 rounds, 4 models) | ✅ | | KG.3 | Bug fixes from structural review (8 fixes) | ✅ | | KG.4 | Design narrative / explainer document | ⏳ (CC task pending) | | KG.5 | Setup llm-graph-builder (neo4j-labs) | ✅ Done — then abandoned (ADR-008) | | KG.6 | Seed Layer 0: AnalysisTask + REQUIRES edges | ✅ (5 tasks, 5 REQUIRES) | | KG.7 | Seed Layer 0: CanonicalConcept, DataProduct, SurveyProcess nodes | ✅ (6 concepts, 4 products, 6 processes) | | KG.8 | First extraction: CPS Handbook of Methods (22 pages) | ✅ (291 nodes, 349 rels) | | KG.9 | Post-extraction enrichment: PRODUCES edge generation | ✅ (93→89 with PRODUCES, 9 dimensions) | | KG.10 | First harvest: violation detection queries | ✅ (8 threshold results, 20 interactions) | | KG.11 | Harvest quality assessment | ✅ (1 genuine finding, 7 false positives — see ADR-008) | | KG.12 | **DECISION:** Replace llm-graph-builder with custom pipeline | ✅ ADR-008, ADR-009 | | KG.13 | CPS-ACS income pack (15-30 context items) | ⏳ Blocked on pipeline rebuild | --- ## Phase 5B: Quarry Extraction Toolkit ✅ COMPLETE **ADRs:** ADR-008 (custom pipeline), ADR-009 (shippable toolkit) **Location:** `scripts/quarry/` **Design spec:** `docs/design/quarry_extraction_pipeline.md` **Depends on:** Phase 5 (schema + Layer 0 seed in quarry DB) **Built:** 2026-02-09 (Claude Code session — 15 files, ~1,690 lines, 4 bugs fixed) | Task | Description | Status | Traces To | |------|-------------|--------|----------| | QT.1 | `config.py` — shared configuration (Neo4j creds, API keys, paths, schema version) | ✅ | ADR-009 | | QT.2 | `seed.py` — Layer 0 setup (idempotent MERGE, --dry-run) | ✅ | KG.6, KG.7 | | QT.3 | `chunk.py` — Docling section-aware PDF chunker (22pg → 157 chunks) | ✅ | ADR-008 | | QT.4 | `extract.py` — PDF → LLM extraction → Neo4j write (MERGE, entity resolution) | ✅ | ADR-008 | | QT.5 | `prompts.py` — Extraction prompt with controlled vocabulary enforcement | ✅ | KG.9 | | QT.6 | Entity resolution at write time (MERGE on canonical names) — built into extract.py | ✅ | ADR-008 | | QT.7 | `harvest.py` — Layer 2 queries with value_type filtering (fixes false positives) | ✅ | KG.10, KG.11 | | QT.8 | `export.py` — stub only (blocked on harvest curation design) | ⏳ stub | ADR-007 | | QT.9 | `schema.json` — machine-readable v3.1 schema definition | ✅ | ADR-009 | | QT.10 | `README.md` — setup, usage, extending to new surveys | ✅ | ADR-009 | | QT.11 | Test: CPS Handbook re-extraction with new pipeline, compare quality vs llm-graph-builder | ✅ | TEVV | | QT.12 | Test: CPS Technical Paper 77 (1,531 chunks) — scalability test | ✅ | TEVV | | QT.13 | Test: ACS General Handbook — cross-survey queries light up | ✅ | KG.13 | **Verification passed:** Chunking (157 chunks, section-aware ✓), seed dry-run (valid Cypher ✓), extraction dry-run (prompts generated ✓) **Results:** 5 documents extracted, 13,227 nodes, 15,355 valid edges. 100% schema compliance after cleanup. Batch mode (--batch-size N) reduces cost ~50% for large documents. ~$55 total extraction cost. Sonnet is minimum viable model (Haiku failed at 25.7% error rate). ~12% confabulated node types, 60% recoverable via reclassification. **Exit Criteria:** ✅ Met. Pipeline extracts PDFs, harvest produces candidates, quality exceeds llm-graph-builder baseline. --- ## Phase 5C: Harvest Curation & Export ⏳ NOT STARTED Depends on: Phase 5B ✅ *Objective: Turn quarry raw material into packaged pragmatics for the MCP runtime.* | Task | Description | Status | Traces To | |------|-------------|--------|----------| | HC.1 | Finish relationship cleanup (delete 556 long-tail invalid edges) | ⏳ CC task ready | QT cleanup | | HC.2 | Run clean harvest, document baseline signal quality | ⏳ | KG.10 | | HC.3 | Build export.py (harvest → staging JSON template generation) | ⏳ | ADR-007 | | HC.4 | Curate temporal comparability batch (~34 candidates → ~15 items) | ⏳ | D.6 | | HC.5 | Curate threshold violations batch (~10 candidates → ~5 items) | ⏳ | D.6 | | HC.6 | Compile new packs, test with MCP server | ⏳ | Phase 3 | | HC.7 | Feed results into Phase 4B evaluation | ⏳ | H.1 | **Exit Criteria:** ≥15 new pragmatics items curated from quarry harvest with full provenance, compiled into packs, tested via MCP. --- ## Tech Debt / Future Work | Item | Priority | Notes | |------|----------|-------| | Source-grounded authoring for all existing content | High | Existing 25 ACS items cite ACS-GEN-001 but were authored from LLM training data, not source docs. Need re-verification against handbook. | | CPS pack (manual) | Medium | Can be authored manually while KG pipeline matures | | Additional ACS docs extraction | Low | Researchers handbook, PUMS handbook | --- ## Risk Items | Risk | Mitigation | Status | |------|------------|--------| | Census API rate limits | Cache responses locally | Mitigated | | Geography disambiguation | LLM handles + edge cases in packs | Resolved | | Pack content takes longer than code | Timebox initial content | Initial content done | | MCP protocol quirks | Test with simple tool first | Not yet started | | Context7 shows unreleased APIs | Always verify against `pip index versions` before assuming API exists | Learned 2026-02-08 | | Agent prompt overfits domain rules | Slim "Never" list, remove survey-specific language | ✅ Resolved (G.6) |

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brockwebb/open-census-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

implementation_schedule.md•23.1 KiB