Medicaid MCP Server
Model Context Protocol (MCP) server for Medicaid public data access via data.medicaid.gov.
Architecture: Hybrid CSV + DKAN API
Background: CMS migrated from Socrata SODA API to DKAN platform. DKAN provides both CSV downloads and a query API.
Strategy:
Small datasets (<50 MB): CSV download + in-memory cache (NADAC, Enrollment)
Large datasets (>100 MB): DKAN API queries (Drug Rebate, Drug Utilization, Federal Upper Limits)
How it works (CSV mode):
First query: Downloads CSV → Parses → Caches → Filters → Returns results
Subsequent queries: Filters cached data → Returns results (<100ms)
How it works (DKAN API mode):
Every query: Fetches 100-5000 records via API → Client-side filters → Returns results (1-2s)
No large downloads, no memory issues, works regardless of file growth
Performance
CSV-cached datasets:
Dataset | Size | Records | First Query | Subsequent Queries | Memory |
NADAC | 123 MB | 1.5M | 20-30s | <100ms | ~200 MB |
Enrollment | 3.6 MB | 10K | 1-2s | <50ms | ~5 MB |
DKAN API datasets:
Dataset | Size | Records | All Queries | Memory |
Federal Upper Limits | 196 MB | 2.1M | 1-2s | ~5 MB |
Drug Rebate | 291 MB | ~3M+ | 1-2s | ~5 MB |
Drug Utilization | 192 MB | 5.3M | 1-2s | ~5 MB |
Cache TTL: 24 hours for NADAC (weekly updates), 7 days for enrollment (monthly updates)
Total Memory: ~210 MB (CSV datasets only, DKAN API uses minimal memory)
Installation
Usage
As MCP Server
Configure in Claude Code or other MCP clients:
Via Python (in agentic-os)
Available Methods
Phase 1 (Implemented & Tested ✓)
get_nadac_pricing - Drug pricing lookup by NDC or name
{ "method": "get_nadac_pricing", "drug_name": "ibuprofen", "limit": 10 }compare_drug_pricing - Multi-drug or temporal comparison
{ "method": "compare_drug_pricing", "ndc_codes": ["00904530909"], "start_date": "2023-01-01", "end_date": "2024-12-31" }get_enrollment_trends - State enrollment over time
{ "method": "get_enrollment_trends", "state": "CA", "start_date": "2023-01-01", "end_date": "2024-12-31" }compare_state_enrollment - Multi-state comparison
{ "method": "compare_state_enrollment", "states": ["CA", "TX", "NY", "FL"], "month": "2024-09" }list_available_datasets - Show available datasets
{ "method": "list_available_datasets" }search_datasets - Generic dataset search
{ "method": "search_datasets", "dataset_id": "nadac", "drug_name": "ibuprofen" }
Phase 2 & 3 (Implemented ✓)
get_federal_upper_limits - FUL pricing lookup (DKAN API)
{ "method": "get_federal_upper_limits", "ingredient": "NYSTATIN", "limit": 10 }get_drug_rebate_info - Rebate program data (DKAN API)
{ "method": "get_drug_rebate_info", "ndc": "00002143380", "limit": 10 }get_state_drug_utilization - Utilization by state (DKAN API)
{ "method": "get_state_drug_utilization", "state": "CA", "drug_name": "OZEMPIC", "year": 2024, "quarter": 4, "limit": 10 }
Data Sources
Dataset | Update Frequency | Size | Records | Access Method | Status |
NADAC (drug pricing) | Weekly | 123 MB | 1.5M | CSV + cache | ✓ Available |
Enrollment snapshot | Monthly | 3.6 MB | 10K | CSV + cache | ✓ Available |
Federal upper limits | Monthly | 196 MB | 2.1M | DKAN API | ✓ Available |
Drug rebate program | Quarterly | 291 MB | ~3M | DKAN API | ✓ Available |
Drug utilization | Quarterly | 192 MB | 5.3M | DKAN API | ✓ Available |
Architecture: Hybrid approach - CSV for small datasets (<50 MB), DKAN API for large datasets (>100 MB). This avoids memory issues while maintaining fast queries.
Testing
Key Differences from Medicare MCP
Feature | Medicare MCP | Medicaid MCP |
Granularity | Provider + procedure level | State-level aggregates |
Data Source | CMS.gov Socrata API | DKAN CSV downloads |
Query Speed | Real-time API calls | Cache-based (fast after first load) |
Use Cases | Clinical utilization analysis | Policy analysis, market access |
Provider Data | Yes (NPI, specialty, procedures) | No (state aggregates only) |
Use Cases
✅ Market access strategy - State coverage prioritization ✅ Drug pricing intelligence - NADAC trends, comparisons ✅ Enrollment forecasting - Growth trends by state ✅ Policy impact assessment - Expansion effects
❌ NOT for:
Provider-level utilization (use Medicare MCP)
Beneficiary-level claims (requires T-MSIS/TAF DUA)
Procedure-level analysis (no HCPCS data)
Architecture Details
CSV Download Flow
Field Mapping
The server automatically maps CSV column names to consistent field names:
NADAC CSV:
NDC Description→descriptionNDC→ndcNADAC Per Unit→nadac_per_unitEffective Date→effective_datePricing Unit→pricing_unit
Enrollment CSV:
State Abbreviation→stateState Name→state_nameReporting Period→reporting_period(YYYYMM format)Total Medicaid and CHIP Enrollment→total_medicaid_chip_enrollment
Cache Manager
Located: src/cache-manager.js
Features:
In-memory data storage with TTL
Download progress tracking (10% increments)
CSV parsing (handles quoted fields with commas)
Cache statistics and health monitoring
Real-World Test Results
NADAC Pricing (Ibuprofen)
California Enrollment Trends
Phase 3 Status - COMPLETED ✓
What Changed from Initial Plan
Original plan: SQLite backend for large datasets Final solution: DKAN API queries (simpler, faster, no SQLite needed!)
Implementation ✓
Found CSV URLs for all 5 datasets
Discovered DKAN query API endpoint
Implemented DKAN API queries for 3 large datasets:
Federal Upper Limits (196 MB, 2.1M records)
Drug Rebate (291 MB, ~3M records)
Drug Utilization (192 MB, 5.3M records)
Tested with real queries (NYSTATIN, California Ozempic, etc.)
Updated architecture to hybrid CSV + DKAN API
Why DKAN API Instead of SQLite?
User feedback: "using sqlite is an overkill for mcp"
DKAN API advantages:
✅ No large downloads (fetch 100-5000 records at a time)
✅ Minimal memory (~5 MB vs 1-4 GB for CSV parsing)
✅ Consistent query speed (1-2s, no slow first query)
✅ Simpler architecture (no SQLite dependency)
✅ Scales forever (file size growth irrelevant)
Future Enhancements
Phase 4 (Optimizations)
Server-side DKAN filtering: Investigate DKAN filter syntax to reduce client-side filtering
Incremental CSV updates: Only download if file changed (ETag/Last-Modified headers)
Compression: gzip cached CSV data to reduce memory footprint (~50% reduction)
Pre-warming: Load CSV cache on server startup for zero-latency first queries
Background refresh: Update cache without blocking queries
Pagination helpers: Better offset/limit handling for large DKAN result sets
Architecture Summary
From actual testing on 2025-12-11 and 2025-12-12:
Dataset | CSV Size | Records | Access Method | Memory | Query Speed |
NADAC | 123 MB | 1,497,925 | CSV + cache | ~200 MB | 20-30s first, <100ms after |
Enrollment | 3.6 MB | 10,098 | CSV + cache | ~5 MB | 1-2s first, <50ms after |
Federal Upper Limits | 196 MB | 2,085,934 | DKAN API | ~5 MB | 1-2s always |
Drug Rebate | 291 MB | ~3M+ | DKAN API | ~5 MB | 1-2s always |
Drug Utilization | 192 MB | 5,284,306 | DKAN API | ~5 MB | 1-2s always |
Total Memory: ~215 MB (2 CSV datasets + 3 DKAN API datasets) Total Disk: Negligible (no large downloads for DKAN datasets)
License
MIT
Author
OpenPharma Organization