Jana MCP Server

KNOWLEDGE_CONTEXT_DISCUSSION.md•12.8 KiB

# Knowledge & Context: Design Discussion **Date:** 2026-01-22 **Status:** Discussion / Pre-ADR **Purpose:** Capture brainstorming on implementing contextual knowledge capabilities --- ## Overview The "Knowledge & Context" capability aims to help users understand what environmental data means, not just retrieve it. This document captures the design discussion, including analysis of regulatory data sources and implementation recommendations. **The Dream:** Answer questions like: - "What is PM2.5 and why does it matter?" - "Is 35 µg/m³ PM2.5 safe?" - "What's the regulatory limit for CO2 emissions?" - "How does this plant's emissions compare to industry average?" - "What caused the spike in emissions in Q3?" --- ## Question Types & Implementation Strategy ### Classification of Knowledge Questions | Question Type | Example | Data Needed | Recommended Source | |---------------|---------|-------------|-------------------| | **Educational** | "What is PM2.5?" | General knowledge | LLM (Claude's training) | | **Threshold Interpretation** | "Is 35 µg/m³ safe?" | Regulatory limits | Static KB + LLM synthesis | | **Regulatory Lookup** | "CO2 limit in California?" | Jurisdiction-specific rules | LLM + authoritative links | | **Contextual Benchmarking** | "How does this plant compare?" | Live data + statistics | API + LLM synthesis | | **Investigative** | "What caused the Q3 spike?" | Historical data + reasoning | API + LLM reasoning | ### Implementation by Question Type #### Type 1: Educational Explanations > "What is PM2.5 and why does it matter?" **Implementation:** Pure LLM — Claude already knows this well. - Environmental science fundamentals don't change - LLM explanations are rich and contextual - Zero implementation cost - Risk: Minimal — this is factual, stable knowledge #### Type 2: Threshold Interpretation > "Is 35 µg/m³ PM2.5 safe?" **Implementation:** Static KB (thresholds) + LLM (synthesis) **Why Static KB for thresholds:** - Accuracy is critical (health/safety implications) - Deterministic — same question should give same thresholds - Thresholds change rarely (WHO updates every ~5-15 years) - Small dataset (~50-100 parameter/threshold combinations) **Proposed KB structure:** ```yaml # knowledge/thresholds.yaml parameters: pm25: name: "PM2.5" full_name: "Fine Particulate Matter (≤2.5 micrometers)" unit: "µg/m³" guidelines: who_2021: annual_mean: 5 daily_mean: 15 source: "WHO Global Air Quality Guidelines 2021" epa_naaqs: annual_mean: 12 daily_mean: 35 source: "US EPA National Ambient Air Quality Standards" ``` #### Type 3: Regulatory Lookup > "What's the regulatory limit for CO2 emissions in California?" **Implementation:** LLM knowledge + authoritative source links This is complex because regulations are: - Jurisdiction-specific (federal, state, local, international) - Sector-specific (power plants vs vehicles vs buildings) - Complex (different metrics, timelines, exemptions) **Recommended approach:** Provide general context from LLM, link to authoritative sources, don't claim comprehensive coverage. #### Type 4: Contextual Benchmarking > "How does this plant's emissions compare to industry average?" **Implementation:** API data + LLM synthesis Requires live data from Jana backend: 1. Get the facility's emissions from API 2. Get sector/peer statistics from API 3. LLM synthesizes the comparison #### Type 5: Investigative Explanation > "What caused the spike in emissions in Q3?" **Implementation:** API data + LLM reasoning Requires: 1. Detecting the anomaly (trend analysis) 2. Correlating with potential causes 3. Reasoning about plausible explanations **Note:** MCP server provides data; Claude hypothesizes. Must be clear these are hypotheses, not proven causes. --- ## Knowledge Source Options Analysis ### Option A: Embed Knowledge in MCP Server (Static KB) Bundle YAML/JSON files in the MCP server containing parameter definitions, thresholds, etc. | Pros | Cons | |------|------| | Fast - no API calls needed | Maintenance burden - must update manually | | Works offline/disconnected | Duplicates what LLM already knows | | Deterministic - same answer every time | Limited scope - only what's in KB | | Version controlled with code | Gets stale (regulations change) | | No hallucination risk for critical thresholds | Adds complexity to MCP server | ### Option B: Add Knowledge Endpoint to Backend Add `/api/v1/esg/knowledge/` endpoints to Jana backend. | Pros | Cons | |------|------| | Single source of truth | More backend work | | Reusable across clients (not just MCP) | Another API to maintain | | Can be updated without MCP release | Adds latency (API call) | | Queryable/filterable | Requires data curation effort | | Supports i18n, versioning, jurisdiction | Over-engineering if only MCP needs it | ### Option C: Use LLM Knowledge (Claude's Training Data) Don't build a knowledge base. Let Claude answer from training data. | Pros | Cons | |------|------| | Zero maintenance | Potential hallucination on specifics | | Broad coverage (can explain anything) | Non-deterministic answers | | Handles nuance and edge cases | Can't guarantee regulatory accuracy | | Natural explanations | Training data may be outdated | | Already available - no build cost | Can't update without model change | ### Option D: Hybrid Approach (Recommended) | Knowledge Type | Best Source | Rationale | |----------------|-------------|-----------| | Parameter definitions | Backend API | Already exists via `get_parameter_definitions()` | | Regulatory thresholds | Static KB in MCP | Critical accuracy, deterministic | | Health effects/explanations | LLM | Broad, nuanced, handles questions well | | Data interpretation | LLM + thresholds | Combine threshold lookup with LLM synthesis | --- ## Regulatory Data API Landscape ### The Core Problem **The measurements have APIs. The thresholds don't.** | Source | API Available? | What It Provides | Auto-Updates? | |--------|---------------|------------------|---------------| | **EPA AirNow** | ✅ Yes | Real-time AQI, forecasts, current conditions | ✅ Real-time | | **EPA AQS** | ✅ Yes | Historical air quality data, monitoring stations | ✅ Scheduled | | **OpenAQ** | ✅ Yes (Jana uses it) | Global air quality measurements | ✅ Real-time | | **EPA NAAQS** | ❌ No API | Published as regulations/PDFs | ❌ Manual | | **WHO Guidelines** | ❌ No API | Published as reports | ❌ Manual (rare) | | **EU Air Quality Directive** | ❌ No API | Published as legal documents | ❌ Manual | ### Why Thresholds Don't Have APIs 1. **They rarely change** - WHO updated guidelines in 2021 (prior: 2005). EPA NAAQS updates infrequently. 2. **They're legal documents** - Regulations are published as law, not data feeds 3. **No business model** - No one has built a regulatory threshold API as a service ### Web Search Results Searched for existing regulatory data APIs and MCP servers. Key findings: - **No existing MCP servers** for environmental regulations or compliance data - **No public APIs** from EPA, WHO, or other bodies specifically for regulatory thresholds - **Pattern:** Data APIs exist (measurements); policy APIs don't (thresholds) ### Maintenance Reality for Air Quality Thresholds | Factor | Reality | |--------|---------| | WHO guideline changes | Every 10-15 years | | EPA NAAQS changes | Every 5-10 years | | AQI breakpoint changes | Essentially never | | Parameter definitions | Stable | **A 100-line YAML file reviewed annually is not a significant maintenance burden.** --- ## Air Quality Thresholds vs. Emissions Regulations ### Key Differences | Aspect | Air Quality (PM2.5, O3, etc.) | Emissions (CO2, CH4, etc.) | |--------|-------------------------------|---------------------------| | **Scope** | Health-based, universal | Economic/environmental policy | | **Complexity** | Simple thresholds | Multi-layered (caps, permits, trading) | | **Jurisdictions** | WHO, EPA, EU (few) | Federal, 50 states, EU, 190+ countries | | **Sectors** | N/A - applies to air | Power, transport, industry, buildings... | | **Change frequency** | Rare (5-15 years) | Frequent (annual adjustments) | | **Data structure** | Simple numbers | Complex (formulas, exemptions, phase-ins) | ### Types of Emissions Regulations **1. Absolute Limits (Simpler)** ``` "Power plants must emit < 1000 lbs CO2/MWh" ``` **2. Cap-and-Trade Systems (Complex)** ``` "California Cap-and-Trade: Annual cap reduced by X% per year" "EU ETS: Allowance prices fluctuate daily" ``` **3. Reporting Requirements (Process)** ``` "Facilities emitting >25,000 tonnes CO2e must report annually" ``` **4. Sector-Specific Standards (Many)** ``` "Vehicle emissions: EPA Tier 3 standards" "Methane from oil/gas: EPA Quad Oa/Ob/Oc" ``` ### The Scale Problem If we tried to capture emissions regulations comprehensively: | Dimension | Scale | |-----------|-------| | Countries | 190+ with varying regulations | | US States | 50+ with different programs | | Sectors | 10+ major sectors | | Gases | 6+ greenhouse gases | | Facility types | 100s of categories | | Time periods | Regulations phase in over years | **Conservative estimate:** 10,000+ distinct regulatory rules globally **This is not a YAML file problem. This is a database problem.** --- ## Recommendations ### For MVP: Hybrid Approach (Air Quality Focus) | Category | MVP Approach | |----------|--------------| | **Air quality thresholds** | Small static KB (~100 lines): WHO 2021, EPA NAAQS, AQI breakpoints | | **Parameter explanations** | LLM knowledge | | **Emissions regulations** | LLM + authoritative source links | | **Compliance determination** | **Out of scope** (liability risk) | **What we explicitly DON'T do in MVP:** - Claim to know current emissions regulatory limits with certainty - Provide compliance determinations - Maintain a comprehensive regulatory database **What we DO well:** - Provide air quality interpretation with accurate thresholds - Explain what parameters mean and health effects - Show emissions data (Climate TRACE, EDGAR) - Help users understand their data - Point to authoritative regulatory sources ### Proposed MVP Static KB Scope ``` knowledge/ ├── parameters.yaml # ~20 parameters (PM2.5, CO2, NO2, O3, etc.) ├── thresholds.yaml # WHO, EPA, EU guidelines for air quality ├── aqi_breakpoints.yaml # US AQI calculation tables └── glossary.yaml # Common terms and acronyms ``` **Total: ~500-1000 lines of YAML** — maintainable, versionable, auditable. ### For Phase 2: Emissions Regulatory Context (If Demanded) If users demand regulatory context for emissions: 1. Survey users on which frameworks matter most 2. Curate top 5-10 frameworks only: - EPA GHG Reporting Program (GHGRP) - EU ETS - California Cap-and-Trade - Paris Agreement NDCs - SEC Climate Disclosure Rules 3. Build as backend endpoint (not MCP-embedded) 4. Establish quarterly review cadence ### Example MVP Interaction Patterns **Air Quality (Full Support):** ``` User: "Is 35 µg/m³ PM2.5 safe?" MCP looks up thresholds from static KB: - WHO 24hr guideline: 15 µg/m³ - EPA 24hr standard: 35 µg/m³ Claude synthesizes: "35 µg/m³ PM2.5 is at the US EPA's 24-hour standard but 2.3x the WHO guideline. This level is classified as 'Moderate' on the AQI scale. Sensitive groups should consider reducing prolonged outdoor exertion." ``` **Emissions Regulations (Limited Support):** ``` User: "What are the CO2 limits for power plants in California?" Claude (from LLM knowledge): "California power plants are subject to multiple regulations including the CARB Cap-and-Trade program, EPA Clean Air Act requirements, and SB 100 mandates. For current compliance requirements, consult: - CARB: https://ww2.arb.ca.gov/... - EPA: https://www.epa.gov/... I can help you analyze emissions data for specific facilities using Jana's Climate TRACE data. Would you like to look up a specific plant?" ``` --- ## Open Questions for Future Discussion 1. **Is annual review of static KB acceptable?** Or must it be truly zero-touch? 2. **Should we wrap AirNow API** for real-time US air quality assessment with built-in categories? 3. **Which emissions regulatory frameworks would be highest value** if we curate them in Phase 2? 4. **Is there liability concern** with providing regulatory interpretation? (May argue for LLM-only approach) 5. **Should the static KB live in the MCP server or Jana backend?** Backend makes it reusable across clients. 6. **How do we handle international users?** EU, Asia, etc. have different standards. --- ## Related Documents - [MCP_SERVER_BRAINSTORM.md](MCP_SERVER_BRAINSTORM.md) - Original capability brainstorm - [ADR-0001: Start Without MCP-Side Memory](adr/0001-start-without-mcp-side-memory.md) - [ADR-0002: NL-to-SQL Generation Strategy](adr/0002-nl-to-sql-generation-strategy.md) --- ## Changelog | Date | Change | |------|--------| | 2026-01-22 | Initial discussion document created |

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Jana-Earth-Data/jana-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

KNOWLEDGE_CONTEXT_DISCUSSION.md•12.8 KiB