CDISC SDTM Validator MCP
Allows deploying the MCP server to Posit Connect, with support for trusted host restrictions and stateless horizontal scaling.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@CDISC SDTM Validator MCPvalidate the pharmaverse_dm sample"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
CDISC SDTM Validator MCP
A Model Context Protocol (MCP) server for validating CDISC SDTM datasets against SDTMIG 3.4 specifications.
Overview
This MCP server provides AI agents and clinical programmers with tools to validate Study Data Tabulation Model (SDTM) datasets. It demonstrates a complete end-to-end pipeline: raw pharmaceutical data → SDTM transformation → validation.
Related MCP server: Define-XML MCP Server
Tools
1. check_required_variables(columns, domain="DM")
Validates that a dataset contains the three universal SDTM identifier variables required in every domain:
STUDYID — Study identifier
DOMAIN — Domain abbreviation (e.g., "DM")
USUBJID — Unique subject identifier
Returns: {"domain", "required", "missing", "ok"}
2. check_dm_required_variables(columns)
Validates that the Demographics (DM) domain contains all required variables per SDTMIG 3.4 Table 3-1:
Universal (3): STUDYID, DOMAIN, USUBJID
DM-specific (12): SUBJID, AGE, AGEU, SEX, RACE, ETHNIC, COUNTRY, ARMCD, ARM, ACTARMCD, ACTARM, RFSTDTC
Returns: {"required", "missing", "ok"}
3. check_controlled_terminology(column, values)
Validates that values in a column conform to CDISC Controlled Terminology (CT) codelists.
Supported variables:
SEX (C66731): F, M, U, UNDIFFERENTIATED
ETHNIC (C66790): HISPANIC OR LATINO, NOT HISPANIC OR LATINO, NOT REPORTED, UNKNOWN
RACE (C74457): WHITE, BLACK OR AFRICAN AMERICAN, ASIAN, AMERICAN INDIAN OR ALASKA NATIVE, NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER, MULTIPLE, NOT REPORTED, UNKNOWN
AGEU (C66781): YEARS, MONTHS, WEEKS, DAYS, HOURS
DTHFL: Y (only valid value for death flag; null/absent means no death)
Returns: {"column", "codelist_id", "valid_values", "invalid", "ok"}
4. validate_dataset(dataset_name=None, dataset=None)
Runs the full validation suite in a single call and returns a combined report. Provide either a bundled sample name (see list_sample_datasets) or an inline Dataset JSON object. This is the recommended entry point for agents — it loads the data, runs all three checks (controlled terminology only for the codelist columns present), and summarizes pass/fail.
Returns: {"dataset", "checks": [...], "summary": {"ok", "passed", "failed"}}
5. list_sample_datasets()
Lists the sample datasets bundled with the server (read live from samples/). Each entry is {"name", "label", "study", "records", "columns", "description"}; use the name with validate_dataset or the sample://<name> resource.
Resources
Each bundled sample is also exposed as an MCP resource at sample://<name> (e.g. sample://pharmaverse_dm), so MCP clients can discover and load the raw Dataset JSON through the native resource primitive.
End-to-End Demo: pharmaverseraw → sdtm.oak → MCP Validation
Background
The pharmaverse ecosystem provides industry-standard tools and data for learning and teaching SDTM. The pharmaverseraw R package contains the CDISCPILOT01 study — a realistic clinical trial dataset in pre-SDTM "raw EDC" format. The sdtm.oak package transforms this raw data into a valid SDTM dataset.
This MCP server completes the pipeline by validating the output:
pharmaverseraw (raw EDC data)
↓ sdtm.oak transformation
SDTM DM domain
↓ MCP validation
Validation reportSample Data: CDISCPILOT01
samples/pharmaverse_dm.json contains 5 subjects from the CDISCPILOT01 study in Dataset JSON format (CDISC's standard JSON data representation). This is real, realistic data used by clinical programmers to learn SDTM transformation workflows.
Subject demographics:
Age range: 63–77 years
Treatment arms: PLACEBO, Xanomeline High Dose, Xanomeline Low Dose
Countries: USA, Japan, Germany
One subject with a recorded death (DTHFL = "Y")
Running the Demo
The server owns the whole pipeline. With the server running (see below), validate a bundled sample with one MCP call:
# List the available samples
curl -s localhost:8000/samples
# Fetch one sample's raw Dataset JSON
curl -s localhost:8000/samples/pharmaverse_dm
# Run the full validation suite via the validate_dataset tool
curl -s localhost:8000/mcp \
-H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
"params":{"name":"validate_dataset","arguments":{"dataset_name":"pharmaverse_dm"}}}'The validate_dataset report includes a summary ({"ok", "passed", "failed"}) plus a per-check breakdown. Try dataset_name: "dm_missing_studyid" to see check_required_variables fail on the missing STUDYID identifier.
The interactive landing page at / does the same thing visually: it lists samples from /samples and renders the validate_dataset report.
Running the Server
Local Development
# Install dependencies
pip install -r requirements.txt
# Start the development server
uvicorn cdisc-mcp:app --reload --port 8000
# Landing page with interactive tool testing
open http://localhost:8000/
# MCP endpoint (for AI agents / clients)
http://localhost:8000/mcpConfiguration
Configuration is optional. One environment variable is recognized (set it in your shell or a local .env file):
CONNECT_SERVER— Restrict incoming connections to a specific Posit Connect hostname (viaTrustedHostMiddleware). Leave unset for local development.
Sample Datasets
The server reads sample datasets from samples/ on disk and serves them via /samples, the sample:// resources, and validate_dataset. Edit a file in samples/ and the change is reflected immediately — there is no copy baked into the front-end.
pharmaverse_dm.json— Realistic CDISCPILOT01 data (all 24 DM columns)dm.json— Hand-crafted valid DM domain (5 subjects, common variables)dm_missing_studyid.json— Test case: missing STUDYID (validates error detection)
Add another sample by dropping a Dataset JSON file into samples/; it appears automatically in the listing, the dropdown, and as a sample://<name> resource. (Optionally add a one-line description to SAMPLE_DESCRIPTIONS in cdisc-mcp.py.)
Dataset JSON Format
Datasets are represented in CDISC Dataset JSON 1.1.0 format. Example structure:
{
"studyOID": "CDISCPILOT01",
"name": "DM",
"label": "Demographics",
"columns": [
{"name": "STUDYID", "label": "Study Identifier", ...},
{"name": "DOMAIN", "label": "Domain Abbreviation", ...},
...
],
"rows": [
["CDISCPILOT01", "DM", "01-701-1015", ...],
...
]
}See the CDISC Dataset JSON specification for full details.
Architecture
All work happens in the server. cdisc-mcp.py reads the sample data from disk, owns the validation orchestration (validate_dataset), and serves both the MCP endpoint and the landing page. The front-end is a thin viewer with no data of its own.
Runtime files (everything the deployed app needs):
cdisc-mcp.py— FastMCP server: validation tools,validate_datasetorchestration,/samplesroutes, andsample://resources; the deployment entrypointlanding.html— Interactive landing page; fetches samples and results from the server at runtimesamples/— Sample datasets in Dataset JSON format; read live by the server (a single source of truth)requirements.txt— Python dependencies, used by Connect to build the environment
Key Design:
Single source of truth for data:
samples/*.jsonon disk, read on every request — editing a sample is reflected everywhere immediatelyStateless HTTP service (scales on Posit Connect)
Hardcoded CT codelists (no external API calls — checks run fully offline)
Deployment-agnostic front-end (MCP and
/samplesURLs derived client-side from the page location)Validation tools registered as plain callables, so
validate_datasetand other Python code can call them directly (not over HTTP)
Deployment to Posit Connect
The server can be deployed to Posit Connect with Posit Publisher, which generates the .posit/ deployment metadata for your environment. Once deployed:
Set
CONNECT_SERVERto the Connect hostname soTrustedHostMiddlewarescopes incoming connections.The server is stateless, so it scales horizontally without shared state.
The deployed bundle needs all runtime files — cdisc-mcp.py, landing.html, requirements.txt, and the samples/ directory (the server reads it at runtime). Make sure the files list in the Posit Publisher config includes samples/.
Future Expansion
Possible directions for richer validation:
Full CDISC Conformance Rules (CORE) validation
Completeness checks beyond required variables
Domain-specific business rule validation
Version-specific SDTMIG checking
References
pharmaverse — CDISC-compliant R tools for clinical data
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/mleary/mcp-cdisc'
If you have feedback or need assistance with the MCP directory API, please join our Discord server