The M4 server enables AI assistants to query and analyze multimodal Electronic Health Record (EHR) datasets through natural language using the Model Context Protocol (MCP).
Dataset Management: List available clinical datasets (MIMIC-IV, MIMIC-IV-Note, eICU) with their status and backend configuration, switch between active datasets, and extend with custom PhysioNet datasets or institutional EHR schemas.
Tabular Data Analysis: Discover database schemas, inspect table structures with column details and sample data, and execute SQL SELECT queries on large clinical datasets containing hundreds of thousands of patients.
Clinical Notes Processing: Search notes by keyword with contextual snippets, retrieve full note text by ID (with optional truncation), list patient note metadata (IDs, types, lengths), filter by note type (discharge, radiology), and cross-reference with tabular data using shared patient identifiers.
Multi-Modal Support: Automatically adapts available tools based on the active dataset's modality (tabular vs. notes) and supports switching between datasets for complementary analyses.
Programmatic Access: Python API for complex, multi-step analyses that returns pandas DataFrames for statistical computations, visualization, and reproducible research notebooks.
Integration: Compatible with MCP clients (Claude Desktop, Cursor, LibreChat), supports both local (DuckDB) and cloud (BigQuery) backends, includes OAuth2 authentication for secure access, and provides contextual guidance through Claude Code skills.
Provides SQL query execution and schema exploration capabilities for clinical datasets stored in DuckDB, including MIMIC-IV and eICU tabular data with tools for database schema inspection, table information retrieval, and SELECT query execution.
Enables querying of large-scale clinical datasets hosted on Google Cloud BigQuery, supporting full MIMIC-IV and eICU datasets for cloud-based analysis of electronic health records.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@M4show me the most common diagnoses in the mimic-iv dataset"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
M4: A Toolbox for LLMs on Clinical Data
M4 is an infrastructure layer for multimodal EHR data that provides LLM agents with a unified toolbox for querying clinical datasets. It supports tabular data and clinical notes, dynamically selecting tools by modality to query MIMIC-IV, eICU, and custom datasets through a single natural-language interface.
M4 is a fork of the M3 project and would not be possible without it 🫶 Please cite their work when using M4!
Quickstart (3 steps)
1. Install uv
macOS/Linux:
Windows (PowerShell):
2. Initialize M4
This downloads the free MIMIC-IV demo dataset (~16MB) and sets up a local DuckDB database.
3. Connect your AI client
Claude Desktop:
Other clients (Cursor, LibreChat, etc.):
Copy the generated JSON into your client's MCP settings, restart, and start asking questions!
If you don't want to use uv, you can just run pip install m4-mcp
If you want to use Docker, look at docs/DEVELOPMENT.md
Code Execution
For complex analysis that goes beyond simple queries, M4 provides a Python API that returns native DataFrames instead of formatted strings. This transforms M4 from a query tool into a complete clinical data analysis environment.
The API uses the same tools as the MCP server, so behavior is consistent. But instead of parsing text, you get DataFrames you can immediately analyze, visualize, or feed into downstream pipelines.
When to use code execution:
Multi-step analyses where each query informs the next
Large result sets (thousands of rows) that shouldn't flood your context
Statistical computations, survival analysis, cohort characterization
Building reproducible analysis notebooks
See Code Execution Guide for the full API reference.
Claude Skills
M4 ships with Claude Code skills that teach Claude how to use the Python API effectively. Skills are contextual prompts that activate when relevant—when you ask Claude about clinical data analysis, it automatically knows how to use M4's API.
See Skills Guide for details on the available skills and how to create custom ones.
Example Questions
Once connected, try asking:
Tabular data (mimic-iv, eicu):
"What tables are available in the database?"
"Show me the race distribution in hospital admissions"
"Find all ICU stays longer than 7 days"
"What are the most common lab tests?"
Clinical notes (mimic-iv-note):
"Search for notes mentioning diabetes"
"List all notes for patient 10000032"
"Get the full discharge summary for this patient"
Supported Datasets
Dataset | Modality | Size | Access | Local | BigQuery |
mimic-iv-demo | Tabular | 100 patients | Free | Yes | No |
mimic-iv | Tabular | 365k patients | Yes | Yes | |
mimic-iv-note | Notes | 331k notes | Yes | Yes | |
eicu | Tabular | 200k+ patients | Yes | Yes |
These datasets are supported out of the box. However, it is possible to add any other custom dataset by following these instructions.
Switch datasets anytime:
Get PhysioNet credentials: Complete the credentialing process and sign the data use agreement for the dataset.
Download the data:
# For MIMIC-IV wget -r -N -c -np --user YOUR_USERNAME --ask-password \ https://physionet.org/files/mimiciv/3.1/ \ -P m4_data/raw_files/mimic-iv # For eICU wget -r -N -c -np --user YOUR_USERNAME --ask-password \ https://physionet.org/files/eicu-crd/2.0/ \ -P m4_data/raw_files/eicuPut the downloaded data in a
m4_datadirectory that ideally is located within the project directory. Name the directory for the datasetmimic-iv/eicu.Initialize:
m4 init mimic-iv # or: m4 init eicu
This converts the CSV files to Parquet format and creates a local DuckDB database.
Available Tools
M4 exposes these tools to your AI client. Tools are filtered based on the active dataset's modality.
Dataset Management:
Tool | Description |
| List available datasets and their status |
| Switch the active dataset |
Tabular Data Tools (mimic-iv, mimic-iv-demo, eicu):
Tool | Description |
| List all available tables |
| Get column details and sample data |
| Run SQL SELECT queries |
Clinical Notes Tools (mimic-iv-note):
Tool | Description |
| Full-text search with snippets |
| Retrieve a single note by ID |
| List notes for a patient (metadata only) |
More Documentation
Guide | Description |
Python API for programmatic access | |
Claude Code skills for contextual assistance | |
MCP tool documentation | |
Google Cloud for full datasets | |
Add your own PhysioNet datasets | |
Contributing, testing, architecture | |
Enterprise security setup |
Roadmap
M4 is designed as a growing toolbox for LLM agents working with EHR data. Planned and ongoing directions include:
More Tools
Implement tools for current modalities (e.g. statistical reports, RAG)
Add tools for new modalities (images, waveforms)
Better context handling
Concise, dataset-aware context for LLM agents
Dataset expansion
Out-of-the-box support for additional PhysioNet datasets
Improved support for institutional/custom EHR schemas
Evaluation & reproducibility
Session export and replay
Evaluation with the latest LLMs and smaller expert models
The roadmap reflects current development goals and may evolve as the project matures.
Troubleshooting
"Parquet not found" error:
MCP client won't connect: Check client logs (Claude Desktop: Help → View Logs) and ensure the config JSON is valid.
Need to reconfigure:
Citation
M4 builds on the M3 project. Please cite: