# COOS / Census MCP β Future Exploration & Enhancement Parking Lot
**Purpose**: Centralized registry of deferred ideas, research leads, and architectural enhancements. Items here are *not* committed work β they're options with documented rationale, to be evaluated against actual need when the time comes.
**Gate rule**: Nothing leaves this parking lot without (1) a validated need from test bench results or user feedback, and (2) a complexity budget that justifies the ROI.
---
## ποΈ Architecture Enhancements
### AE-1: Ontology-Grounded Tool Discovery
**Source**: [Grounded Agents: Annotating Ontologies with Tool Definitions](https://medium.com/@aiwithakashgoyal/grounded-agents-annotating-ontologies-with-tool-definitions-b0950ba0217d)
**Idea**: Store MCP tool capabilities as nodes in Neo4j linked to COOS concepts via `AFFORDS_OPERATION` relationships. Agent dynamically discovers valid operations per concept rather than relying on prompt engineering.
**Current gap**: COOS concepts and MCP tools are connected implicitly through prompt design, not formally in the graph.
**Pattern**: Define ontology (TTL) β materialize in Neo4j β attach tool metadata β agent queries graph for valid operations.
**Complexity**: Medium. Requires schema extension + tool registration workflow.
**When to evaluate**: When adding new MCP tools beyond the current three, or when prompt-based tool routing starts failing edge cases.
### AE-2: Decision Traces for Fitness Judgments
**Source**: [Agentic Context Graphs](https://medium.com/@aiwithakashgoyal/the-trillion-dollar-context-graph-turning-organizational-memory-into-your-greatest-asset-abd489241755)
**Idea**: When the pragmatics layer fires a fitness-for-use judgment (e.g., "ACS 1-year unavailable for pop < 65K"), store the reasoning chain as a structured Decision Trace node in Neo4j β which guidance was consulted, what threshold triggered, what the recommendation was.
**Value**: Auditable trail for statistical consultation quality. Training data for future evaluation. Pattern discovery across judgments.
**Schema sketch**:
```
(:FitnessJudgment {
query_context,
guidance_consulted[],
threshold_triggered,
recommendation,
timestamp
})-[:APPLIED_TO]->(:Variable)
-[:REFERENCED]->(:MethodologyGuidance)
```
**Complexity**: Low-medium. Mostly logging infrastructure.
**When to evaluate**: During test bench development β this is the natural instrumentation layer.
### AE-3: Hybrid Search Scoring Formula
**Source**: [Agentic Context Graphs, Β§5.1](https://medium.com/@aiwithakashgoyal/the-trillion-dollar-context-graph-turning-organizational-memory-into-your-greatest-asset-abd489241755)
**Idea**: Weighted hybrid scoring: `0.6 * vector_similarity + 0.2 * entity_match + 0.1 * policy_match + 0.1 * recency`. Applicable to coarse-fine search weighting.
**Current state**: Coarse-fine search exists but weighting is ad hoc.
**Complexity**: Low. Parameterize existing search, grid-search over weights.
**When to evaluate**: During search quality tuning phase.
### AE-4: Truth Maintenance Loop for Methodology Updates
**Source**: [Self-Evolving Neuro-Symbolic KG](https://medium.com/@aiwithakashgoyal/how-to-build-a-neuro-symbolic-medical-knowledge-graph-that-learns-reasons-and-self-corrects-f6d66e7e915a)
**Idea**: When Census methodology changes (new population thresholds, revised MOE calculations), propagate updates systematically through the graph rather than manual editing. Detect contradiction β resolve β update KG β update affected pragmatics.
**Complexity**: Medium-high. Requires change detection and propagation logic.
**When to evaluate**: When onboarding second survey (CPS) or when ACS methodology changes force manual rework.
---
## π Search & Retrieval Enhancements
### SR-1: Multi-Representation Embeddings
**Source**: Internal (ACS Variable Search Refactor Plan)
**Idea**: Embed variables multiple ways β description, use cases, synthetic queries β for richer semantic matching.
**Complexity**: Medium. Multiple embedding passes + fusion logic.
### SR-2: Cross-Encoder Reranker
**Source**: Internal
**Idea**: Train on (query, variable) pairs for +5-8% precision improvement.
**Complexity**: Medium. Requires training data collection.
### SR-3: Query Expansion via LLM
**Source**: Internal
**Idea**: Use LLM to generate query variations before search, improving recall.
**Complexity**: Low-medium. Latency tradeoff.
### SR-4: Fine-Tune BGE on Census Domain
**Source**: Internal
**Idea**: Domain-specific fine-tuning on census (query, variable) pairs.
**Complexity**: High. Requires curated training set.
**Gate**: Only after test bench proves generic embeddings are the bottleneck.
---
## π€ Agent Architecture Enhancements
### AG-1: Multi-Agent Ensemble (Deferred)
**Source**: [Agentic Context Graphs, Β§Multi-Agent Collaboration](https://medium.com/@aiwithakashgoyal/the-trillion-dollar-context-graph-turning-organizational-memory-into-your-greatest-asset-abd489241755)
**Idea**: Specialized agents for compliance, methodology, geography, etc. with coordinated decision-making.
**Current stance**: Single agent with methodology guidance is correct for now. The Jobs Doctrine applies β don't build orchestration complexity prematurely.
**When to evaluate**: When test bench reveals systematic failures that a single agent can't address, or when cross-survey support (CPS + ACS) demands domain specialization.
### AG-2: ACE-Style Learning Loop
**Source**: [Agentic Context Graphs, Β§ACE Framework](https://medium.com/@aiwithakashgoyal/the-trillion-dollar-context-graph-turning-organizational-memory-into-your-greatest-asset-abd489241755)
**Idea**: Record decision outcomes β analyze patterns β update playbooks. Generator β Reflector β Curator cycle.
**Dependency**: Requires AE-2 (Decision Traces) as prerequisite instrumentation.
**Complexity**: High. Full feedback loop with evaluation infrastructure.
**When to evaluate**: After production deployment with real user queries generating outcome data.
### AG-3: GNN-Based Concept Discovery
**Source**: [Self-Evolving Neuro-Symbolic KG, Β§Neural Discovery Track](https://medium.com/@aiwithakashgoyal/how-to-build-a-neuro-symbolic-medical-knowledge-graph-that-learns-reasons-and-self-corrects-f6d66e7e915a)
**Idea**: Graph Neural Networks over knowledge graph to discover latent patterns and predict missing relationships.
**Current stance**: Not at the scale where this makes sense. COOS has ~330 concepts and ~37K variables β GNNs shine at 100K+ nodes with complex topology.
**When to evaluate**: Post-multi-survey expansion when graph complexity warrants it.
---
## π Knowledge Engineering Enhancements
### KE-1: Formal TTL-First Ontology Workflow
**Source**: [Grounded Agents](https://medium.com/@aiwithakashgoyal/grounded-agents-annotating-ontologies-with-tool-definitions-b0950ba0217d)
**Idea**: Strict workflow: define ontology in TTL (RDF/OWL) β materialize in Neo4j β attach tool/pragmatics metadata. Currently we go concept JSON β TTL β Neo4j, which works but isn't formally grounded in RDF semantics.
**Value**: Interoperability with semantic web standards, FAIR data principles, potential integration with other federal ontologies.
**Complexity**: Medium. Requires RDF/OWL expertise and tooling (n10s plugin).
**When to evaluate**: When interoperability with other federal knowledge systems becomes a requirement.
### KE-2: Negative Knowledge as Structured Guidance
**Source**: Internal (LLM Ontology Review)
**Idea**: 56 rejected concepts transformed into active "don't go here" guidance. Already partially implemented.
**Status**: Concept validated, implementation in progress.
### KE-3: Probabilistic Concept Assignment
**Source**: Internal (Spatial Topology Discovery)
**Idea**: Data-driven concept assignment using embeddings rather than manual curation. Eliminates "uncategorized" problem.
**Status**: Pipeline exists, needs integration with production system.
---
## π Cross-Survey & Expansion
### CS-1: Cross-Survey Geographic Intelligence Sharing
**Source**: Internal (REQ-FUTURE-001)
**Idea**: Share geographic resolution logic across federal surveys (CPS, SIPP, ACS) while maintaining survey-specific knowledge bases.
**Gate**: Validation over premature abstraction. Test with CPS first.
### CS-2: BLS/CPS Domain Expansion
**Source**: Internal (Communications Strategy)
**Idea**: Extend pragmatics and semantic intelligence to Current Population Survey and Bureau of Labor Statistics data.
**Dependency**: Core ACS system must be validated first.
---
## π Evaluation & Testing
### ET-1: Persona-Based Test Batteries
**Source**: Internal
**Idea**: Test across user sophistication levels. Weight 80% toward edge cases.
**Status**: Planned for test bench development.
### ET-2: Multi-Model Adversarial Evaluation
**Source**: Internal
**Idea**: Test bench protocol across Claude, OpenAI, Gemini to validate model-agnostic value of pragmatics layer.
**Status**: Evaluation prompt template exists. Needs systematic execution.
---
## π References
| ID | Source | URL |
|----|--------|-----|
| REF-1 | Grounded Agents: Annotating Ontologies with Tool Definitions | [Medium](https://medium.com/@aiwithakashgoyal/grounded-agents-annotating-ontologies-with-tool-definitions-b0950ba0217d) |
| REF-2 | Agentic Context Graphs: Turning Organizational Memory Into Your Greatest Asset | [Medium](https://medium.com/@aiwithakashgoyal/the-trillion-dollar-context-graph-turning-organizational-memory-into-your-greatest-asset-abd489241755) |
| REF-3 | Self-Evolving Neuro-Symbolic Medical Knowledge Graph | [Medium](https://medium.com/@aiwithakashgoyal/how-to-build-a-neuro-symbolic-medical-knowledge-graph-that-learns-reasons-and-self-corrects-f6d66e7e915a) |
| REF-4 | Foundation Capital: AI's Trillion Dollar Opportunity β Context Graphs | [Foundation Capital](https://foundationcapital.com/context-graphs-ais-trillion-dollar-opportunity/) |
---
## π§ͺ Extraction Pipeline Improvements
### EP-1: LangExtract for Source Grounding
**Source**: [google/langextract](https://github.com/google/langextract) β 17K stars, Apache 2.0
**Idea**: Replace or augment Docling+LLM extraction with LangExtract's character-level source grounding and multi-pass extraction. Maps every extraction to exact character offsets in source text. Interactive HTML visualization for review.
**What it solves**: Current pipeline tracks chunk_index but not character-level provenance. Single-pass extraction may miss entities that multi-pass would catch.
**What it doesn't solve**: Outputs flat JSONL, not typed knowledge graph nodes. No controlled vocabulary enforcement, no cross-document entity resolution, no harvest/validation. Graph layer would need to be rebuilt on top.
**Steal-worthy ideas**: Character-level provenance, interactive extraction visualization, few-shot example-driven extraction.
**Complexity**: Medium. Integration layer between LangExtract JSONL output and Neo4j graph writer.
**When to evaluate**: Next batch of documents after FCSM sprint, or if provenance auditing becomes a requirement.
### EP-2: MinerU 2.5 for PDF Parsing
**Source**: [opendatalab/MinerU](https://github.com/opendatalab/MinerU) β 1.2B parameter VLM model
**Idea**: Replace Docling with MinerU's hybrid VLM engine for PDF parsing. SOTA on OmniDocBench, surpassing Gemini 2.5 Pro and GPT-4o on document parsing. 10GB VRAM minimum for hybrid engine β runs on M1 Pro 32GB.
**What it solves**: Better table structure detection, visual layout understanding for complex Census multi-level header tables that text-based parsers mangle.
**What it doesn't solve**: Same downstream pipeline (chunks β LLM extraction β Neo4j). Improvement is in chunk quality, not extraction quality.
**Complexity**: Low. Drop-in replacement for Docling chunking stage.
**When to evaluate**: Next document batch. Compare chunk quality on a table-heavy Census document.
### EP-3: Batch Chunk Extraction (3+ chunks per API call)
**Source**: Internal β Brock implemented similar pattern in 12/2024
**Idea**: Group 3-5 chunks per API call, amortize schema/prompt overhead. Reduces API calls by 60-80% and cost proportionally.
**What it solves**: Quality Standards had 2476 chunks at ~$31 single-chunk. Batch-3 would be ~$10-12.
**Risk**: Larger blast radius per failure. JSON array parsing more fragile.
**Complexity**: Low-medium. Prompt modification + response array parsing.
**When to evaluate**: Before any document >1000 chunks. Should have been built before Quality Standards extraction.
---
*Last updated: 2026-02-09*
*Gate rule reminder: Validate need before graduating any item from this lot.*