Skip to main content
Glama
musharna

data-aggregator-mcp

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
NCBI_API_KEYNoRaises the NCBI E-utilities rate limit (3 → 10 req/s) used by the omics, literature, and taxonomy lookups.
UNPAYWALL_EMAILNoEnables the Unpaywall fallback leg of literature full-text retrieval.

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
searchA

Search public research-data archives, omics registries, and the literature for datasets, software, publications, and sequencing data. Fans out across Zenodo, DataCite (Dryad, Figshare, Dataverse, OSF, Mendeley, OpenNeuro), NCBI omics (GEO, SRA, BioProject), literature (PubMed + OpenAIRE), HuggingFace Hub (datasets), DataONE (eco/environmental federation), OmicsDI (proteomics/metabolomics), RCSB PDB (macromolecular structures), GWAS Catalog (genotype-phenotype studies), OpenML (ML datasets), DANDI (neurophysiology dandisets), and CZ CELLxGENE (single-cell datasets). Returns compact DataResource records; per-source failures are reported in errors{}. Use resolve for the full record (SRA resolve attaches the ENA FASTQ manifest; publication resolve attaches links[] to datasets/accessions, normalized identifiers (pmid/pmcid/doi), and — when open access — a full-text file), then fetch to download files. Pass organism= to expand the query with NCBI-Taxonomy synonyms; results carry normalized taxa[] + plant cross-links. Pass disease= to expand the query with MeSH descriptor synonyms (e.g. 'breast cancer' also matches 'Breast Neoplasms'); the expansion is echoed in mesh_expansion. Pass tissue= to expand the query with UBERON synonyms (e.g. 'liver' also matches 'iecur'/'jecur'); the expansion is echoed in tissue_expansion. Pass chemical= to expand the query with ChEBI compound synonyms (e.g. 'caffeine' also matches '1,3,7-trimethylxanthine'); the expansion is echoed in chemical_expansion. Pass assay= to expand the query with EDAM assay/method synonyms (e.g. 'ChIP-seq' also matches 'ChIP-sequencing'); echoed in assay_expansion. Pass collapse_mirrors=true to opt into conservative cross-repo mirror collapse: same-dataset copies under different/no DOIs are folded into one record, with the folded copies annotated under mirrors[].

resolveA

Fetch the full DataResource for a known id (e.g. 'zenodo:7654321', 'datacite:10.5061/dryad.x', 'hf:owner/name', a bare Zenodo record id, or a DOI), including the complete files[] manifest. Publication resolve also attaches normalized identifiers (pmid/pmcid/doi) and, when open access, a full-text file. Pass cite= to render a citation onto the result (citation field); omitted means no citation. Pass trust=true to attach retraction status (via Crossref) under trust{}. Pass fair=true to attach an RDA-grounded FAIRness score (0–100 + F/A/I/R sub-scores + actionable gaps) computed from the record under fair{}. Pass use= (commercial/redistribute/modify/ml-training) to attach a licence-compatibility advisory (ALLOW/REVIEW/DENY, not legal advice) under license_compat{}. Pass format=provenance for a one-call RO-Crate 1.1 data-availability dossier (under provenance{}) composing version-currency, licence+SPDX, FAIR score, retraction status, and the source/DOI/ID chain — it auto-attaches fair + trust.

fetchA

Download a resource's files to local disk and return the PATHS (never the file contents). Fetchable backends: Zenodo (md5-verified); SRA via ENA FASTQ (md5-verified); GEO supplementary files (unverified); DataCite sub-repos — Figshare/Dataverse/OSF (md5-verified), OpenNeuro (snapshot manifest, unverified), Dryad is manifest-only (resolve lists files, fetch fails loud), Mendeley + other DataCite repos fail loud; PubMed/OpenAIRE open-access full text (EuropePMC XML / Unpaywall PDF, unverified); HuggingFace Hub (unverified); DataONE Member-Node objects (md5/SHA-256-verified); OmicsDI — PRIDE + MetaboLights only (unverified), MassIVE/GNPS/PeptideAtlas/Metabolomics Workbench fail loud; DANDI dandisets (302→S3, unverified); CZ CELLxGENE H5AD/RDS assets (unverified); OpenML ARFF (md5-verified); RCSB PDB .cif/.pdb structure files (unverified). Fails loud if selected files exceed max_bytes unless force=true. Verifies checksums; writes a .dataresource.json sidecar.

list_sourcesA

List wired data sources and their capabilities (layer, kinds, supported filters, auth requirement, rate limit, status).

operateA

Inspect or query a remote tabular file (Parquet/CSV/TSV) WITHOUT downloading it. op='schema' returns columns+types; 'preview' a small sample; 'head' the first n rows; 'sql' a read-only SELECT against the file (exposed as the view 'data', e.g. "SELECT * FROM data WHERE x > 1"). op='peek' profiles every column WITHOUT downloading — type, null-rate, approximate distinct count, min/max, and numeric quartiles (a DuckDB SUMMARIZE; like head/sql it reads the whole file, so it honors the source-size ceiling). Addresses a file by catalog id + file name (resolve the id first to see files[] and access_modes). Requires the [operate] extra; fails loud if the file is not an operable tabular file.

relateA

Given 2-10 resource ids, return metadata-level join/harmonization HINTS: how the datasets relate and on what key they could be joined. Detects shared accessions (BioProject/SRA/GEO), shared cross-identifiers (doi/pmid/pmcid), explicit links between the inputs, and version lineage. HINTS ONLY — it does not read file columns, fetch files, or execute any join/merge/conversion; each hint names the shared value as evidence. Resolve ids first if you only have a search result. Per-id resolve failures are reported, not fatal.

Prompts

Interactive templates invoked by user choice

NameDescription
find_dataFind datasets/data for a topic, optionally scoped to an organism.
data_behind_paperFind the datasets / accessions behind a paper (by DOI, PMID, or title).
search_resolve_fetchWalk the search → resolve → fetch flow for a data need.

Resources

Contextual data attached and managed by the client

NameDescription
sourcesThe wired data sources and their capabilities (same payload as the list_sources tool), as JSON.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/musharna/data-aggregator-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server