fetch
Download and verify files from diverse research data repositories, returning local file paths while enforcing size limits and checksum integrity.
Instructions
Download a resource's files to local disk and return the PATHS (never the file contents). Fetchable backends: Zenodo (md5-verified); SRA via ENA FASTQ (md5-verified); GEO supplementary files (unverified); DataCite sub-repos — Figshare/Dataverse/OSF (md5-verified), OpenNeuro (snapshot manifest, unverified), Dryad is manifest-only (resolve lists files, fetch fails loud), Mendeley + other DataCite repos fail loud; PubMed/OpenAIRE open-access full text (EuropePMC XML / Unpaywall PDF, unverified); HuggingFace Hub (unverified); DataONE Member-Node objects (md5/SHA-256-verified); OmicsDI — PRIDE + MetaboLights only (unverified), MassIVE/GNPS/PeptideAtlas/Metabolomics Workbench fail loud; DANDI dandisets (302→S3, unverified); CZ CELLxGENE H5AD/RDS assets (unverified); OpenML ARFF (md5-verified); RCSB PDB .cif/.pdb structure files (unverified). Fails loud if selected files exceed max_bytes unless force=true. Verifies checksums; writes a .dataresource.json sidecar.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| id | Yes | Source-prefixed id or bare Zenodo id | |
| dest | No | Destination dir (default managed cache) | |
| files | No | Glob over file names (default all) | |
| max_bytes | No | Byte ceiling before failing loud | |
| force | No | Override max_bytes | |
| extract | No | Unpack downloaded zip/tar archives into the destination (default false). Path-traversal-guarded; counts against max_bytes. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paths | No | ||
| bytes | No | ||
| skipped | No | ||
| resumed | No |