Open Census MCP Server

appendix_and_presentation_notes.md•5.46 KiB

# Parking Lot: Paper Appendix & Presentation Notes ## Last Updated: 2026-02-15 --- ## Appendix A: Behavioral Comparison Table (PLACEHOLDER) Need a table showing side-by-side examples of Control vs Treatment (and later RAG) responses for selected queries. Purpose: illustrate the *qualitative* differences that the quantitative scores capture. ### Queries to Feature Pick queries that illustrate different effects: | Query | What It Shows | |-------|---------------| | NORM-008 (unemployment) | Source selection: Control uses ACS, Treatment redirects to BLS LAUS. OpenAI judges caught this, Anthropic didn't. | | GEO-002 (geographic comparison) | Full spectrum: Control scores 0 on D2/D3/D5, Treatment scores 2 across the board. Most dramatic gap. | | SML-002 (small area poverty) | Uncertainty communication: Treatment provides CVs and MOE warnings, Control gives vague "patterns" | | AMB-001 (ambiguous geography) | Both ask for clarification, but Treatment educates about 65K threshold and 1-year vs 5-year | | NORM-001 (California population) | The interesting disagreement: OpenAI prefers control (decennial is right answer), Anthropic prefers treatment (better traceability) | | PER-001a/b/c (persona variants) | The only queries where Anthropic preferred control — what's different about these? | ### For Each Query, Show: - The query text - Control response (first ~200 words) - Treatment response (first ~200 words) - RAG response (when available) - Judge scores (D1-D5) from each vendor - Notable reasoning excerpts from judges - What this illustrates about the system ### Pipeline Traceability Examples Show the tool call chain for a treatment response: 1. Query received 2. `get_methodology_guidance` called → what pragmatics were returned 3. `explore_variables` called → what variables identified 4. `get_census_data` called → what data retrieved 5. Final response synthesized 6. Stage 3 fidelity: which claims mapped to which tool returns This makes the "auditability" metric concrete — readers can SEE the trace. --- ## Presentation Strategy ### FCSM 2026 (primary) - Audience: federal statisticians, methodology nerds - Lead with: the semantic smearing problem (they'll recognize it immediately) - Show: CQS effect sizes, fidelity numbers, the three-group table - Killer slide: the auditability gap (72.8% vs 8.1%) - Don't say: "this replaces [NORC's work]" — let them draw the conclusion ### April 30 Event (secondary submission) - Friend on committee, likely acceptance - Can be more provocative / less formal? - Consider: emphasizing the LLM-as-judge methodology findings (self-enhancement bias, vendor disagreement) as a standalone contribution ### The NORC Comparison (NEVER say this publicly) $700K for single-shot prompts on base models analyzing responses. No tool augmentation, no knowledge graph, no fitness-for-use judgment, no auditability pipeline. Our system does what they did plus: curated expert knowledge delivery at decision points, full traceability from query to tool call to response claim, automated fidelity verification, and empirical evaluation with effect sizes. Built by one person with ~$55 in API costs. The work speaks for itself. Let the audience do the math. --- ## GraphRAG Deflection (if asked) **Q: "Why not use GraphRAG?"** **A:** "We're doing the valuable part of GraphRAG — the knowledge graph — but at build time, not query time. GraphRAG constructs graph representations per query, which is computationally expensive ($$$) and doesn't scale for real-world deployment. We pre-curate the expert judgment into the graph once, with human quality review, then serve it at runtime with sub-second latency via simple vector lookup. GraphRAG would re-derive the relationships for every query and wouldn't know which relationships encode expert judgment vs which are just structural. Our approach is actually the distillation of what GraphRAG tries to do — but with a quality step (expert curation) that you can't get from automated graph construction. We'd welcome someone running GraphRAG on the same test battery for comparison. We predict it would score between RAG and Pragmatics on D3/D5 — better retrieval structure, but still missing the fitness-for-use judgment that isn't in any document." **Cost comparison to have in back pocket:** - GraphRAG per query: ~$0.50-2.00 (graph construction + traversal + synthesis) - Pragmatics per query: ~$0.01 (vector lookup + pack retrieval) - At 1000 queries/day: GraphRAG = $500-2000/day, Pragmatics = $10/day - Plus: GraphRAG quality is uncontrolled. Pragmatics quality is curated. **⚠️ TODO (post-FCSM, before April 30):** The cost and latency numbers above are rough estimates / vibes. Before citing in any presentation or paper, need to find authoritative sources for GraphRAG runtime costs and latency. Look for: - Microsoft Research GraphRAG papers (they built it — they'll have compute numbers) - Academic benchmarks comparing RAG variants with cost/latency metrics - Cloud provider pricing analysis for graph construction at query time - Any published cost-per-query data from production GraphRAG deployments Use Perplexity to find these, but verify against primary sources (conference papers, technical reports, vendor documentation). Not blog posts. Need citable numbers if anyone pushes back on the cost claim. Also: capture our OWN per-query costs from the eval pipeline as a concrete comparison point. We have token counts and latency in every judge record — compute actual $/query for the pragmatics system.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brockwebb/open-census-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

appendix_and_presentation_notes.md•5.46 KiB