Skip to main content
Glama
brockwebb

Open Census MCP Server

by brockwebb
phase3_architecture.md10.7 kB
# Phase 3 System Architecture - Human Language Complexity Translation ## Core Concept: Division of Labor **tidycensus (Kyle Walker's Domain):** Census API complexity - FIPS codes, API endpoints, MOE calculations, data formatting **Census MCP Server (Our Domain):** Human language complexity - regional concepts, ambiguous terms, statistical reasoning --- ## Human Language Complexity Examples ### Geographic Complexity Translation - **"the northeast"** → 6 specific states: CT, ME, MA, NH, RI, VT - **"rural areas"** → urban-rural classification codes + geographic filtering - **"major cities"** → population threshold + geography hierarchy decision - **"Austin"** → Austin, TX (not Austin, MN or 20 other Austins) ### Variable Complexity Translation - **"teacher salaries"** → BLS not Census + methodology explanation + where to look - **"income"** → median not mean + proper universe + statistical caveats - **"poverty"** → which poverty measure + threshold definition + exclusions ### Statistical Complexity Translation - **"average"** → median for skewed distributions, mean for normal distributions - **"compare"** → proper geographic resolution + sample size adequacy - **"rate"** → proper denominator + universe definition + reliability checks --- ```mermaid graph TB subgraph "User Layer" U[User Query: "How much do teachers make in Austin?"] CD[Claude Desktop] U --> CD end subgraph "MCP Protocol Layer" CD --> MCP[MCP Server Entry Point] end subgraph "Intelligence Layer - Phase 3 Enhanced" MCP --> QP[Query Parser & Router] QP --> SI[Semantic Index<br/>⚡ <100ms Core Queries] QP --> KB[Knowledge Base<br/>📚 RAG Vector Search] SI --> SM[Static Mappings<br/>🎯 Power Law Variables] SI --> FC[Fuzzy Concept Matcher<br/>🔍 Alias Expansion] KB --> VDB[Vector Database<br/>ChromaDB + Sentence Transformers] KB --> DOC[R Documentation Corpus<br/>Census Methodology] end subgraph "Data Retrieval Layer" SM --> RE[R Engine<br/>tidycensus Integration] FC --> RE KB --> RE RE --> GP[Geography Parser<br/>Location → FIPS Codes] RE --> VM[Variable Mapper<br/>Concepts → Census Variables] RE --> TC[tidycensus Core<br/>R Subprocess] end subgraph "External APIs" TC --> CAPI[Census Bureau APIs<br/>ACS/Decennial Data] TC --> TIGER[TIGER Geographic Data<br/>Shapefiles & Boundaries] end subgraph "Response Layer" RE --> SP[Statistical Processor<br/>MOE Calculations & Validation] SP --> RF[Response Formatter<br/>Context + Methodology Notes] RF --> MCP end style SI fill:#e1f5fe,stroke:#01579b,stroke-width:3px style SM fill:#f3e5f5,stroke:#4a148c,stroke-width:2px style FC fill:#fff3e0,stroke:#e65100,stroke-width:2px style RE fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px ``` ## Geographic Intelligence Translation Architecture ```mermaid graph LR subgraph "Human Geographic Concepts" HG1["the northeast"] HG2["rural counties"] HG3["Harris County"] HG4["major cities"] HG5["which state has highest..."] end subgraph "Geography Translator Engine" HG1 --> GT1[Regional Mapper<br/>Northeast → CT,ME,MA,NH,RI,VT] HG2 --> GT2[Classification Mapper<br/>Rural → NCHS urban-rural codes] HG3 --> GT3[Disambiguation Engine<br/>Harris County → Harris County, Texas] HG4 --> GT4[Hierarchy Selector<br/>Major cities → population threshold + geography level] HG5 --> GT5[Comparison Router<br/>National comparison → all states analysis] end subgraph "tidycensus-Compatible Output" GT1 --> TC1[geography='state'<br/>state=c('CT','ME','MA','NH','RI','VT')] GT2 --> TC2[geography='county'<br/>+ rural filter logic] GT3 --> TC3[geography='county'<br/>state='TX', county='Harris'] GT4 --> TC4[geography='place'<br/>+ population threshold filter] GT5 --> TC5[geography='state'<br/>state=NULL (all states)] end style GT1 fill:#e1f5fe,stroke:#01579b,stroke-width:2px style GT2 fill:#f3e5f5,stroke:#4a148c,stroke-width:2px style GT3 fill:#fff3e0,stroke:#e65100,stroke-width:2px style GT4 fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px style GT5 fill:#fce4ec,stroke:#880e4f,stroke-width:2px ``` ## The 4 Essential Capabilities (Not Individual Tools) ### 1. Demography - Variable Intelligence Translation ```mermaid graph LR D1["teacher salary"] --> DT1[Domain Router<br/>→ BLS not Census] D2["median income"] --> DT2[Variable Mapper<br/>→ B19013_001 + why median] D3["poverty rate"] --> DT3[Concept Definer<br/>→ B17001_002 + universe] D4["average income"] --> DT4[Statistical Advisor<br/>→ Use median for income] style DT1 fill:#e1f5fe style DT2 fill:#f3e5f5 style DT3 fill:#fff3e0 style DT4 fill:#e8f5e8 ``` ### 2. Geography - Spatial Intelligence Translation ```mermaid graph LR G1["the northeast"] --> GT1[Regional Resolver<br/>→ Multi-state analysis] G2["rural counties"] --> GT2[Classification Filter<br/>→ Geographic filtering] G3["Harris County"] --> GT3[Disambiguator<br/>→ Harris County, Texas] G4["which state highest"] --> GT4[Comparison Router<br/>→ National analysis] style GT1 fill:#e1f5fe style GT2 fill:#f3e5f5 style GT3 fill:#fff3e0 style GT4 fill:#e8f5e8 ``` ### 3. Statistics - Methodological Intelligence ```mermaid graph LR S1[Margin of Error] --> ST1[Interpretation Engine<br/>Confidence intervals] S2[Sample Size] --> ST2[Reliability Checker<br/>Adequate/inadequate] S3[Median vs Mean] --> ST3[Measure Selector<br/>Appropriate statistic] S4[Statistical Validity] --> ST4[Quality Controller<br/>Suppression rules] style ST1 fill:#e1f5fe style ST2 fill:#f3e5f5 style ST3 fill:#fff3e0 style ST4 fill:#e8f5e8 ``` ### 4. Statistical Reasoning - Domain Intelligence ```mermaid graph LR R1["What is average teacher salary?"] --> RT1[Context Provider<br/>US average + BLS guidance +<br/>suggest location specificity] R2[Data Source Routing] --> RT2[Agency Router<br/>Census vs BLS vs Other] R3[Limitation Explanation] --> RT3[Scope Clarifier<br/>What we can/cannot answer] R4[Question Improvement] --> RT4[Query Enhancer<br/>Guide to better questions] style RT1 fill:#e1f5fe style RT2 fill:#f3e5f5 style RT3 fill:#fff3e0 style RT4 fill:#e8f5e8 ``` ## Data Platform Architecture - Knowledge Base & Ontology Integration ```mermaid graph TB subgraph "Build Time - Knowledge Base Platform" CONFIG[config.yaml<br/>Surveys, Years, Agencies] CONFIG --> PULL[pull-ontology-sources.py<br/>Automated Data Pulls] PULL --> RAW[ontology/<br/>Raw Authoritative Data] PULL --> DOCS[source-docs/<br/>RAG Documentation] RAW --> PROCESS[build-ontology.py<br/>Knowledge Graph Builder] DOCS --> BUILD[build-vector-db.py<br/>RAG Vector Database] PROCESS --> KG[knowledge-graph/<br/>Processed Relationships] BUILD --> VDB[Vector Database<br/>85MB ChromaDB] end subgraph "Runtime - Fast Lookup Layer" KG --> SQLITE[data/ontology/<br/>SQLite Complex Queries] KG --> JSON[data/ontology/<br/>JSON Hash Tables] VDB --> VECTOR[data/vector_db/<br/>RAG Search] SQLITE --> LOOKUP[Ontology Lookup Engine<br/><1ms Variable Resolution] JSON --> LOOKUP VECTOR --> RAG[RAG Context Engine<br/>Methodology & Documentation] end subgraph "Human Language Translation" HLT[Human Language Translator] --> LOOKUP HLT --> RAG LOOKUP --> ROUTE[Routing Decision<br/>Census vs BLS vs Other] RAG --> CONTEXT[Statistical Context<br/>Methodology & Caveats] end style CONFIG fill:#e1f5fe,stroke:#01579b,stroke-width:2px style LOOKUP fill:#c8e6c9,stroke:#2e7d32,stroke-width:3px style RAG fill:#f3e5f5,stroke:#4a148c,stroke-width:2px style ROUTE fill:#fff3e0,stroke:#e65100,stroke-width:2px ``` ## Configuration-Driven Data Platform ### Knowledge Base Configuration Structure ```yaml # knowledge-base/scripts/config.yaml surveys: census: - name: "acs5" years: [2018, 2019, 2020, 2021, 2022] description: "5-year American Community Survey" variables_endpoint: "{year}/acs/acs5/variables.json" - name: "acs1" years: [2019, 2021, 2022] description: "1-year American Community Survey" variables_endpoint: "{year}/acs/acs1/variables.json" - name: "dec" years: [2020] description: "Decennial Census" variables_endpoint: "{year}/dec/variables.json" future_expansion: - name: "sipp" source: "census" description: "Survey of Income and Program Participation" - name: "cps" source: "bls" description: "Current Population Survey" agencies: census: base_url: "https://api.census.gov/data" ontology_priority: "primary" bls: soc_codes_url: "https://www.bls.gov/soc/" ontology_priority: "occupation_routing" geographic_levels: - "us" - "state" - "county" - "place" - "tract" - "block_group" ``` ### Build Pipeline Architecture ```mermaid graph LR subgraph "Automated Data Collection" C[config.yaml] --> P1[pull-ontology-sources.py] P1 --> API1[Census Variables API<br/>28K+ variables × 5 years] P1 --> API2[BLS SOC Classifications<br/>Occupation taxonomies] P1 --> API3[Geographic Hierarchies<br/>TIGER relationships] end subgraph "Data Processing Pipeline" API1 --> O1[ontology/census-variables-*.csv] API2 --> O2[ontology/bls-soc-codes.json] API3 --> O3[ontology/geographic-hierarchy.json] O1 --> B[build-ontology.py] O2 --> B O3 --> B B --> KG1[knowledge-graph/variable-concepts.json] B --> KG2[knowledge-graph/agency-routing.json] B --> KG3[knowledge-graph/geographic-relationships.json] end subgraph "Runtime Optimization" KG1 --> R1[data/ontology/census_variables.db<br/>SQLite for complex queries] KG2 --> R2[data/ontology/concept_mapping.json<br/>Hash tables for speed] KG3 --> R3[data/ontology/geographic_hierarchy.json<br/>Nested relationships] end style C fill:#e1f5fe,stroke:#01579b,stroke-width:2px style B fill:#f3e5f5,stroke:#4a148c,stroke-width:2px style R1 fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px ```

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brockwebb/open-census-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server