data-aggregator-mcp

fetch

Download files from research repositories (Zenodo, GEO, SRA, etc.) to local disk, verifying checksums and returning file paths.

Instructions

Download a resource's files to local disk and return the PATHS (never the file contents). Fetchable backends: Zenodo (md5-verified); SRA via ENA FASTQ (md5-verified); GEO supplementary files (unverified); DataCite sub-repos — Figshare/Dataverse/OSF (md5-verified), OpenNeuro (snapshot manifest, unverified), Dryad is manifest-only (resolve lists files, fetch fails loud), Mendeley + other DataCite repos fail loud; PubMed/OpenAIRE open-access full text (EuropePMC XML / Unpaywall PDF, unverified); HuggingFace Hub (unverified); DataONE Member-Node objects (md5/SHA-256-verified); OmicsDI — PRIDE + MetaboLights only (unverified), MassIVE/GNPS/PeptideAtlas/Metabolomics Workbench fail loud; DANDI dandisets (302→S3, unverified); CZ CELLxGENE H5AD/RDS assets (unverified); OpenML ARFF (md5-verified); RCSB PDB .cif/.pdb structure files (unverified). Fails loud if selected files exceed max_bytes unless force=true. Verifies checksums; writes a .dataresource.json sidecar.

Input Schema

TableJSON Schema

Name	Required	Description
`id`	Yes	Source-prefixed id or bare Zenodo id
`dest`	No	Destination dir (default managed cache)
`files`	No	Glob over file names (default all)
`force`	No	Override max_bytes
`extract`	No	Unpack downloaded zip/tar archives into the destination (default false). Path-traversal-guarded; counts against max_bytes.
`max_bytes`	No	Byte ceiling before failing loud

Output Schema

TableJSON Schema

Name	Required	Description	Default
`bytes`	No
`paths`	No
`resumed`	No
`skipped`	No
`unverified`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds substantial behavioral context beyond annotations: it describes checksum verification, sidecar writing, failure behavior on size/backends, and unverified sources. There is no contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-organized: it front-loads the core purpose, then lists backends with verification status, then general behavior. Every sentence adds information, though it could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (many backends, multiple parameters, output schema exists), the description is comprehensive. It covers backend-specific behavior, failure modes, sidecar file, and checksum verification, with no obvious gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description adds global context about 'max_bytes' and 'force' but does not significantly enhance individual parameter semantics beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it downloads a resource's files to local disk and returns paths, not contents. It lists specific backends, distinguishing this tool from siblings like search or resolve.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit context about when to use this tool (e.g., downloading from various backends) and mentions failure conditions (e.g., Dryad, Mendeley). However, it does not explicitly state when not to use it or suggest alternatives beyond naming sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/musharna/data-aggregator-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server