Schema | data-aggregator-mcp

data-aggregator-mcp

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`NCBI_API_KEY`	No	Raises the NCBI E-utilities rate limit (3 → 10 req/s) used by the omics, literature, and taxonomy lookups.
`UNPAYWALL_EMAIL`	No	Enables the Unpaywall fallback leg of literature full-text retrieval.

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": false }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
searchA	Search public research-data archives, omics registries, and the literature for datasets, software, publications, and sequencing data. Fans out across Zenodo, DataCite (Dryad, Figshare, Dataverse, OSF, Mendeley, OpenNeuro), NCBI omics (GEO, SRA, BioProject), literature (PubMed + OpenAIRE), HuggingFace Hub (datasets), DataONE (eco/environmental federation), OmicsDI (proteomics/metabolomics), RCSB PDB (macromolecular structures), GWAS Catalog (genotype-phenotype studies), OpenML (ML datasets), DANDI (neurophysiology dandisets), and CZ CELLxGENE (single-cell datasets). Returns compact DataResource records; per-source failures are reported in errors{}. Use resolve for the full record (SRA resolve attaches the ENA FASTQ manifest; publication resolve attaches links[] to datasets/accessions, normalized identifiers (pmid/pmcid/doi), and — when open access — a full-text file), then fetch to download files. Pass organism= to expand the query with NCBI-Taxonomy synonyms; results carry normalized taxa[] + plant cross-links. Pass disease= to expand the query with MeSH descriptor synonyms (e.g. 'breast cancer' also matches 'Breast Neoplasms'); the expansion is echoed in mesh_expansion. Pass tissue= to expand the query with UBERON synonyms (e.g. 'liver' also matches 'iecur'/'jecur'); the expansion is echoed in tissue_expansion. Pass chemical= to expand the query with ChEBI compound synonyms (e.g. 'caffeine' also matches '1,3,7-trimethylxanthine'); the expansion is echoed in chemical_expansion. Pass assay= to expand the query with EDAM assay/method synonyms (e.g. 'ChIP-seq' also matches 'ChIP-sequencing'); echoed in assay_expansion. Pass collapse_mirrors=true to opt into conservative cross-repo mirror collapse: same-dataset copies under different/no DOIs are folded into one record, with the folded copies annotated under mirrors[]. An ontology param that matches no term in its registry (e.g. organism='yeast' — NCBI Taxonomy indexes no such common name) is reported in unresolved[] and the search runs WITHOUT that expansion, so a dropped filter is never silent. Clients that support form elicitation are asked for a replacement term before the search runs.
resolveA	Fetch the full DataResource for a known id (e.g. 'zenodo:7654321', 'datacite:10.5061/dryad.x', 'hf:owner/name', a bare Zenodo record id, or a DOI), including the complete files[] manifest. Publication resolve also attaches normalized identifiers (pmid/pmcid/doi) and, when open access, a full-text file. Pass cite= to render a citation onto the result (citation field); omitted means no citation. Pass trust=true to attach retraction status (via Crossref) under trust{}. Pass fair=true to attach an RDA-grounded FAIRness score (0–100 + F/A/I/R sub-scores + actionable gaps) computed from the record under fair{}. Pass use= (commercial/redistribute/modify/ml-training) to attach a licence-compatibility advisory (ALLOW/REVIEW/DENY, not legal advice) under license_compat{}. Pass format=provenance for a one-call RO-Crate 1.1 data-availability dossier (under provenance{}) composing version-currency, licence+SPDX, FAIR score, retraction status, and the source/DOI/ID chain — it auto-attaches fair + trust.
fetchA	Download a resource's files to local disk and return the PATHS (never the file contents). Fetchable backends: Zenodo (md5-verified); SRA via ENA FASTQ (md5-verified); GEO supplementary files (unverified); DataCite sub-repos — Figshare/Dataverse/OSF (md5-verified), OpenNeuro (snapshot manifest, unverified), Dryad is manifest-only (resolve lists files, fetch fails loud), Mendeley + other DataCite repos fail loud; PubMed/OpenAIRE open-access full text (EuropePMC XML / Unpaywall PDF, unverified); HuggingFace Hub (unverified); DataONE Member-Node objects (md5/SHA-256-verified); OmicsDI — PRIDE + MetaboLights only (unverified), MassIVE/GNPS/PeptideAtlas/Metabolomics Workbench fail loud; DANDI dandisets (302→S3, unverified); CZ CELLxGENE H5AD/RDS assets (unverified); OpenML ARFF (md5-verified); RCSB PDB .cif/.pdb structure files (unverified). Fails loud if selected files exceed max_bytes unless force=true. Verifies checksums; writes a .dataresource.json sidecar.
list_sourcesA	List wired data sources and their capabilities (layer, kinds, supported filters, auth requirement, rate limit, status).
operateA	Inspect or query a remote tabular file (Parquet/CSV/TSV) WITHOUT downloading it. op='schema' returns columns+types; 'preview' a small sample; 'head' the first n rows; 'sql' a read-only SELECT against the file (exposed as the view 'data', e.g. "SELECT * FROM data WHERE x > 1"). op='peek' profiles every column WITHOUT downloading — type, null-rate, approximate distinct count, min/max, and numeric quartiles (a DuckDB SUMMARIZE; like head/sql it reads the whole file, so it honors the source-size ceiling). Addresses a file by catalog id + file name (resolve the id first to see files[] and access_modes). Requires the [operate] extra; fails loud if the file is not an operable tabular file.
relateA	Given 2-10 resource ids, return metadata-level join/harmonization HINTS: how the datasets relate and on what key they could be joined. Detects shared accessions (BioProject/SRA/GEO), shared cross-identifiers (doi/pmid/pmcid), explicit links between the inputs, and version lineage. HINTS ONLY — it does not read file columns, fetch files, or execute any join/merge/conversion; each hint names the shared value as evidence. Resolve ids first if you only have a search result. Per-id resolve failures are reported, not fatal.

Prompts

Interactive templates invoked by user choice

Name	Description
`find_data`	Find datasets/data for a topic, optionally scoped to an organism.
`data_behind_paper`	Find the datasets / accessions behind a paper (by DOI, PMID, or title).
`search_resolve_fetch`	Walk the search → resolve → fetch flow for a data need.

Resources

Contextual data attached and managed by the client

Name	Description
`sources`	The wired data sources and their capabilities (same payload as the list_sources tool), as JSON.

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/musharna/data-aggregator-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server