Skip to main content
Glama
README.md10.5 kB
# Medicaid MCP Server Model Context Protocol (MCP) server for Medicaid public data access via data.medicaid.gov. ## Architecture: Hybrid CSV + DKAN API **Background**: CMS migrated from Socrata SODA API to DKAN platform. DKAN provides both CSV downloads and a query API. **Strategy**: - **Small datasets (<50 MB)**: CSV download + in-memory cache (NADAC, Enrollment) - **Large datasets (>100 MB)**: DKAN API queries (Drug Rebate, Drug Utilization, Federal Upper Limits) **How it works (CSV mode)**: 1. **First query**: Downloads CSV → Parses → Caches → Filters → Returns results 2. **Subsequent queries**: Filters cached data → Returns results (<100ms) **How it works (DKAN API mode)**: 1. **Every query**: Fetches 100-5000 records via API → Client-side filters → Returns results (1-2s) 2. **No large downloads**, **no memory issues**, **works regardless of file growth** ### Performance **CSV-cached datasets**: | Dataset | Size | Records | First Query | Subsequent Queries | Memory | |---------|------|---------|-------------|-------------------|--------| | NADAC | 123 MB | 1.5M | 20-30s | <100ms | ~200 MB | | Enrollment | 3.6 MB | 10K | 1-2s | <50ms | ~5 MB | **DKAN API datasets**: | Dataset | Size | Records | All Queries | Memory | |---------|------|---------|-------------|--------| | Federal Upper Limits | 196 MB | 2.1M | 1-2s | ~5 MB | | Drug Rebate | 291 MB | ~3M+ | 1-2s | ~5 MB | | Drug Utilization | 192 MB | 5.3M | 1-2s | ~5 MB | **Cache TTL**: 24 hours for NADAC (weekly updates), 7 days for enrollment (monthly updates) **Total Memory**: ~210 MB (CSV datasets only, DKAN API uses minimal memory) ## Installation ```bash cd /Users/joan.saez-pons/code/medicaid-mcp-server npm install ``` ## Usage ### As MCP Server Configure in Claude Code or other MCP clients: ```json { "mcpServers": { "medicaid-mcp": { "command": "node", "args": ["/Users/joan.saez-pons/code/medicaid-mcp-server/src/index.js"] } } } ``` ### Via Python (in agentic-os) ```python from mcp.servers.medicaid_mcp import medicaid_info # Get NADAC pricing for ibuprofen result = medicaid_info( method='get_nadac_pricing', drug_name='ibuprofen', limit=10 ) # Get California enrollment trends result = medicaid_info( method='get_enrollment_trends', state='CA', start_date='2023-01-01', end_date='2024-12-31' ) ``` ## Available Methods ### Phase 1 (Implemented & Tested ✓) 1. **get_nadac_pricing** - Drug pricing lookup by NDC or name ```javascript { "method": "get_nadac_pricing", "drug_name": "ibuprofen", "limit": 10 } ``` 2. **compare_drug_pricing** - Multi-drug or temporal comparison ```javascript { "method": "compare_drug_pricing", "ndc_codes": ["00904530909"], "start_date": "2023-01-01", "end_date": "2024-12-31" } ``` 3. **get_enrollment_trends** - State enrollment over time ```javascript { "method": "get_enrollment_trends", "state": "CA", "start_date": "2023-01-01", "end_date": "2024-12-31" } ``` 4. **compare_state_enrollment** - Multi-state comparison ```javascript { "method": "compare_state_enrollment", "states": ["CA", "TX", "NY", "FL"], "month": "2024-09" } ``` 5. **list_available_datasets** - Show available datasets ```javascript { "method": "list_available_datasets" } ``` 6. **search_datasets** - Generic dataset search ```javascript { "method": "search_datasets", "dataset_id": "nadac", "drug_name": "ibuprofen" } ``` ### Phase 2 & 3 (Implemented ✓) 7. **get_federal_upper_limits** - FUL pricing lookup (DKAN API) ```javascript { "method": "get_federal_upper_limits", "ingredient": "NYSTATIN", "limit": 10 } ``` 8. **get_drug_rebate_info** - Rebate program data (DKAN API) ```javascript { "method": "get_drug_rebate_info", "ndc": "00002143380", "limit": 10 } ``` 9. **get_state_drug_utilization** - Utilization by state (DKAN API) ```javascript { "method": "get_state_drug_utilization", "state": "CA", "drug_name": "OZEMPIC", "year": 2024, "quarter": 4, "limit": 10 } ``` ## Data Sources | Dataset | Update Frequency | Size | Records | Access Method | Status | |---------|-----------------|------|---------|---------------|--------| | NADAC (drug pricing) | Weekly | 123 MB | 1.5M | CSV + cache | ✓ Available | | Enrollment snapshot | Monthly | 3.6 MB | 10K | CSV + cache | ✓ Available | | Federal upper limits | Monthly | 196 MB | 2.1M | DKAN API | ✓ Available | | Drug rebate program | Quarterly | 291 MB | ~3M | DKAN API | ✓ Available | | Drug utilization | Quarterly | 192 MB | 5.3M | DKAN API | ✓ Available | **Architecture**: Hybrid approach - CSV for small datasets (<50 MB), DKAN API for large datasets (>100 MB). This avoids memory issues while maintaining fast queries. ## Testing ```bash # Test CSV implementation (ibuprofen pricing, CA enrollment) node test-fixed-implementation.js # Inspect CSV column structure node inspect-csv-columns.js # List available datasets node test-csv-implementation.js ``` ## Key Differences from Medicare MCP | Feature | Medicare MCP | Medicaid MCP | |---------|--------------|--------------| | **Granularity** | Provider + procedure level | State-level aggregates | | **Data Source** | CMS.gov Socrata API | DKAN CSV downloads | | **Query Speed** | Real-time API calls | Cache-based (fast after first load) | | **Use Cases** | Clinical utilization analysis | Policy analysis, market access | | **Provider Data** | Yes (NPI, specialty, procedures) | No (state aggregates only) | ## Use Cases ✅ **Market access strategy** - State coverage prioritization ✅ **Drug pricing intelligence** - NADAC trends, comparisons ✅ **Enrollment forecasting** - Growth trends by state ✅ **Policy impact assessment** - Expansion effects ❌ **NOT for**: - Provider-level utilization (use Medicare MCP) - Beneficiary-level claims (requires T-MSIS/TAF DUA) - Procedure-level analysis (no HCPCS data) ## Architecture Details ### CSV Download Flow ``` User Query: medicaid_info(method='get_nadac_pricing', drug_name='ibuprofen') ↓ Check Cache ├─ HIT → Filter cached data → Return results (<100ms) │ └─ MISS ↓ Download CSV from download.medicaid.gov (20s for NADAC) ↓ Parse CSV to JSON (5s for 1.5M records) ↓ Store in memory with TTL ↓ Filter parsed data ↓ Return results Next Query (within cache TTL) ↓ Cache HIT → Filter → Return (<100ms) ``` ### Field Mapping The server automatically maps CSV column names to consistent field names: **NADAC CSV**: - `NDC Description` → `description` - `NDC` → `ndc` - `NADAC Per Unit` → `nadac_per_unit` - `Effective Date` → `effective_date` - `Pricing Unit` → `pricing_unit` **Enrollment CSV**: - `State Abbreviation` → `state` - `State Name` → `state_name` - `Reporting Period` → `reporting_period` (YYYYMM format) - `Total Medicaid and CHIP Enrollment` → `total_medicaid_chip_enrollment` ### Cache Manager Located: `src/cache-manager.js` Features: - In-memory data storage with TTL - Download progress tracking (10% increments) - CSV parsing (handles quoted fields with commas) - Cache statistics and health monitoring ## Real-World Test Results ### NADAC Pricing (Ibuprofen) ``` First query: 20 seconds (download + parse 123 MB) Total records: 1,497,925 (all drugs in NADAC database) Ibuprofen records: 13,361 products Sample result: - CHILDREN IBUPROFEN 100 MG/5 ML - NDC: 00904530909 - Price: $0.02421 per ML - Effective Date: 12/20/2023 Second query: <100ms (from cache) ``` ### California Enrollment Trends ``` First query: ~150ms (download + parse 3.6 MB) Total records: 10,098 (all states, all months) California records: 48 monthly snapshots Sample result: - Period: 202301 (Jan 2023) - State: California - Total Enrollment: 14,122,814 - Medicaid: 13,256,478 - CHIP: 866,336 Second query: <50ms (from cache) ``` ## Phase 3 Status - COMPLETED ✓ ### What Changed from Initial Plan **Original plan**: SQLite backend for large datasets **Final solution**: DKAN API queries (simpler, faster, no SQLite needed!) ### Implementation ✓ - [x] Found CSV URLs for all 5 datasets - [x] Discovered DKAN query API endpoint - [x] Implemented DKAN API queries for 3 large datasets: - Federal Upper Limits (196 MB, 2.1M records) - Drug Rebate (291 MB, ~3M records) - Drug Utilization (192 MB, 5.3M records) - [x] Tested with real queries (NYSTATIN, California Ozempic, etc.) - [x] Updated architecture to hybrid CSV + DKAN API ### Why DKAN API Instead of SQLite? **User feedback**: "using sqlite is an overkill for mcp" **DKAN API advantages**: - ✅ No large downloads (fetch 100-5000 records at a time) - ✅ Minimal memory (~5 MB vs 1-4 GB for CSV parsing) - ✅ Consistent query speed (1-2s, no slow first query) - ✅ Simpler architecture (no SQLite dependency) - ✅ Scales forever (file size growth irrelevant) ## Future Enhancements ### Phase 4 (Optimizations) 1. **Server-side DKAN filtering**: Investigate DKAN filter syntax to reduce client-side filtering 2. **Incremental CSV updates**: Only download if file changed (ETag/Last-Modified headers) 3. **Compression**: gzip cached CSV data to reduce memory footprint (~50% reduction) 4. **Pre-warming**: Load CSV cache on server startup for zero-latency first queries 5. **Background refresh**: Update cache without blocking queries 6. **Pagination helpers**: Better offset/limit handling for large DKAN result sets ## Architecture Summary From actual testing on 2025-12-11 and 2025-12-12: | Dataset | CSV Size | Records | Access Method | Memory | Query Speed | |---------|----------|---------|---------------|--------|-------------| | NADAC | 123 MB | 1,497,925 | CSV + cache | ~200 MB | 20-30s first, <100ms after | | Enrollment | 3.6 MB | 10,098 | CSV + cache | ~5 MB | 1-2s first, <50ms after | | Federal Upper Limits | 196 MB | 2,085,934 | DKAN API | ~5 MB | 1-2s always | | Drug Rebate | 291 MB | ~3M+ | DKAN API | ~5 MB | 1-2s always | | Drug Utilization | 192 MB | 5,284,306 | DKAN API | ~5 MB | 1-2s always | **Total Memory**: ~215 MB (2 CSV datasets + 3 DKAN API datasets) **Total Disk**: Negligible (no large downloads for DKAN datasets) ## License MIT ## Author OpenPharma Organization

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/openpharma-org/medicaid-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server