Which integrations are available for this server?

Generates nf-core/rnaseq samplesheets from SRA/GEO accessions or SeqRun dicts, enabling automatic pipeline execution.

How do I use BioTrax?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@BioTrax search ENCODE for QKI eCLIP tracks" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

BioTrax

by SpaceSorcerer

Overview Schema Related Servers Score Discussions

Python

Local

BioTrax

The first unified genomic track, peak, and sequence retrieval tool.

One Python package and MCP server to discover, describe, and download data from ENCODE, ChIP-Atlas, ReMap, GEO, and SRA/ENA — with unified metadata, resolved DOI/PMID provenance, and ready-to-run pipeline outputs (nf-core/rnaseq samplesheets, RBP-RELI/TF-RELI library indexes).

No existing tool unifies peak/track retrieval from all five sources behind one interface. BioTrax fills that gap and goes further: it is also a better sra-prefetch, pulling FASTQs directly from ENA with md5 verification and no SRA-toolkit dependency.

What problem it solves

Before BioTrax	With BioTrax
5 different APIs and 5 scripts	`biotrax search-tracks --target QKI --assay eCLIP`
`sra-prefetch` + `fasterq-dump` setup required	Direct ENA FASTQ download, no SRA toolkit
Manual nf-core samplesheet creation	Auto-generated from any SRA/GEO accession
No DOI/PMID linkage on downloaded files	Every leaf folder has `metadata.json` + `README.md` + resolved DOI
RELI peak library built by hand	`make_reli_index` exporter from any peak pull
"What did I download and from where?"	Append-only master index TSV across all downloads

Related MCP server: AlphaFold Sovereign MCP

Sources

Source	Data	Capabilities
ENCODE	All assays: eCLIP, ChIP-seq, ATAC-seq, DNase-seq, RNA-seq, Hi-C, ...	search, list_files, doi_resolve, design_note
ChIP-Atlas	Assembled TF/histone ChIP-seq + ATAC-seq peak BEDs (merged from all public experiments)	search, list_files
ReMap	Curated TF ChIP-seq peaks — primary TF-RELI library source (human hg38)	search, list_files
GEO	Supplementary track/peak files (BED, BigWig, narrowPeak) with rich series provenance	search, list_files, doi_resolve, design_note
SRA/ENA	Sequencing reads — any strategy (RNA-seq, ChIP-seq, ATAC-seq, ...)	search, list_runs, doi_resolve, design_note

Install

# From source (recommended while pre-release):
git clone https://github.com/SpaceSorcerer/BioTrax.git
pip install -e BioTrax

# With uvx (runs without permanent install):
uvx --from /path/to/BioTrax biotrax list-sources

# Once published to PyPI:
pip install biotrax

Python >= 3.10 required. Dependencies: httpx>=0.27, fastmcp>=0.4.

Download root (configurable)

By default, downloads land in ~/BioTrax (your home directory).

Set the environment variable BIOTRAX_DOWNLOAD_ROOT to use any path:

# Linux / macOS
export BIOTRAX_DOWNLOAD_ROOT=/data/biotrax

# Windows PowerShell
$env:BIOTRAX_DOWNLOAD_ROOT = "D:\BioTraxData"

# In an MCP config (e.g. Claude Code) — set under env for the server process

Precedence: explicit out_root= argument > BIOTRAX_DOWNLOAD_ROOT env var > ~/BioTrax.

Example local setup (author's machine): BIOTRAX_DOWNLOAD_ROOT=F:\BioTrax is set in the MCP server environment, pointing downloads to a dedicated research data drive.

MCP server (Claude / AI agents)

Add to Claude Code

# Point to the Python interpreter and server module in your clone:
claude mcp add biotrax -- /path/to/BioTrax/.venv/bin/python /path/to/BioTrax/biotrax/server.py

To set the download root in the MCP config, add it to the server's environment:

{
  "mcpServers": {
    "biotrax": {
      "command": "/path/to/.venv/bin/python",
      "args": ["/path/to/BioTrax/biotrax/server.py"],
      "env": {
        "BIOTRAX_DOWNLOAD_ROOT": "/your/download/path"
      }
    }
  }
}

Run manually (stdio transport)

python -m biotrax.server
# or:
python /path/to/BioTrax/biotrax/server.py

Available MCP tools

Tool	What it does
`search_tracks`	Search ENCODE, ChIP-Atlas, ReMap, GEO for peak/track datasets
`list_track_files`	List downloadable files for one dataset (with direct URLs)
`get_dataset`	Fetch full metadata for one dataset by accession
`download_tracks`	Download files to disk; writes manifest + metadata + README
`list_sources`	Introspect all sources, capabilities, and filter keys
`search_runs`	Search SRA/ENA for sequencing experiments
`download_fastq`	Resolve accessions, download FASTQs, emit nf-core samplesheet
`make_rnaseq_samplesheet`	Build nf-core/rnaseq samplesheet from SeqRun dicts
`make_reli_index`	Build RBP-RELI / TF-RELI library index from downloaded BED files

CLI usage

# Search ENCODE for QKI eCLIP datasets (GRCh38, human default)
biotrax search-tracks --target QKI --assay eCLIP --sources ENCODE

# List peak files for an ENCODE experiment
biotrax list-files --source ENCODE --id ENCSR366YOG --output-type peaks

# Get full metadata (DOI, design note, file count)
biotrax get-dataset --source GEO --id GSE78509

# Search ChIP-Atlas for CTCF hg38 assembled peaks
biotrax search-tracks --target CTCF --sources ChIP-Atlas --genome GRCh38

# Search ReMap for all CTCF ChIP-seq datasets
biotrax search-tracks --target CTCF --sources ReMap

# Download FASTQs from an SRA study + write nf-core samplesheet
biotrax download-fastq --accessions SRP123456 --strandedness reverse

# Download FASTQs from a GEO series
biotrax download-fastq --accessions GSE151511

# Resolve a single SRR run
biotrax search-runs --raw-query tax_eq(9606) --limit 5

# List all sources and their filter keys
biotrax list-sources

# Override the download root for a single command
biotrax download-tracks --items '[...]' --out-root /tmp/test_download

Python library usage

from biotrax.sources.encode import ENCODEAdapter

adapter = ENCODEAdapter()

# Search for QKI eCLIP datasets
datasets = adapter.search(target="QKI", assay="eCLIP", genome="GRCh38")
# Returns: [Dataset(dataset_id='ENCSR366YOG', target='QKI', biosample='K562', ...),
#           Dataset(dataset_id='ENCSR570WLM', target='QKI', biosample='HepG2', ...)]

# Get full metadata including DOI
ds = adapter.get_dataset("ENCSR366YOG")
# ds.publication_doi -> "10.1038/s41586-020-2077-3"
# ds.pmid            -> "32728246"
# ds.design_note     -> "eCLIP, QKI, K562 experiment from ENCODE. ..."

# List peak files with direct download URLs
files = adapter.list_files(ds, output_type="peaks")
# files[0].url -> "https://www.encodeproject.org/files/ENCFF786UOW/@@download/..."
# files[0].file_format -> "bed"
# files[0].md5 -> "..."

from biotrax.sources.sra_ena import SRAENAAdapter
from biotrax.exporters.nfcore_rnaseq import write_nfcore_samplesheet

adapter = SRAENAAdapter()

# Resolve a GSE to runs with ENA FASTQ URLs (no SRA toolkit needed)
runs = adapter.list_runs("GSE151511")
# runs[0].run_accession -> "SRR12345678"
# runs[0].fastq_urls    -> ["https://ftp.sra.ebi.ac.uk/...R1.fastq.gz", ...]
# runs[0].fastq_md5     -> ["abc123...", ...]
# runs[0].library_layout -> "PAIRED"

# Write nf-core/rnaseq samplesheet
write_nfcore_samplesheet(runs, "samplesheet.csv", strandedness="reverse")

from biotrax.sources.geo import GEOAdapter

adapter = GEOAdapter()

# Fetch a GEO series with provenance
datasets = adapter.search(filters={"gse": "GSE78509"})
ds = datasets[0]
# ds.pmid             -> "27068461"
# ds.publication_doi  -> "10.1016/j.celrep.2016.03.052"
# ds.design_note      -> "This SuperSeries is composed of..."

# List supplementary track files (BED, BigWig, narrowPeak)
files = adapter.list_files(ds)
# -> 37 TrackFile objects with HTTPS download URLs

from biotrax.sources.remap import RemapAdapter

adapter = RemapAdapter()
datasets = adapter.search(target="CTCF", organism="Homo sapiens")
# datasets[0].n_files -> 629  (1 bulk BED + 628 per-experiment BEDs)
# datasets[0].publication_doi -> "10.1093/nar/gkab996"

files = adapter.list_files(datasets[0])
# files[0].url -> "http://remap.univ-amu.fr/storage/remap2022/hg38/MACS2/TF/CTCF/..."

from biotrax.exporters.reli_index import write_reli_index, write_reli_index_annotated
from pathlib import Path

# Build RBP-RELI / TF-RELI library index from downloaded peak BEDs
write_reli_index(track_files, local_paths, "CLIPseq.my_run.index")
# Output (tab-delimited, no header, two columns):
#   QKI_K562_ENCODE_ENCFF786UOW    /path/to/ENCFF786UOW.bed.gz
#   CTCF_K562_ENCODE_ENCFF001XYZ   /path/to/ENCFF001XYZ.bed.gz

Download layout

Files land in a self-describing, deterministic hierarchy under $BIOTRAX_DOWNLOAD_ROOT (or ~/BioTrax by default):

$BIOTRAX_DOWNLOAD_ROOT/
  ENCODE/GRCh38/QKI__eCLIP__K562__ENCSR366YOG/
      ENCFF786UOW.bed.gz
      manifest.tsv          <- file-level: filename, url, md5, size_bytes, format
      metadata.json         <- full Dataset: DOI, PMID, design_note, all fields
      README.md             <- human-readable per-dataset summary

  ChIP-Atlas/GRCh38/TFs-and-others__CTCF__All-cell-types__05__hg38/
      Oth.ALL.05.CTCF.AllCell.bed
      manifest.tsv
      metadata.json
      README.md

  SRA-ENA/SRP123456/SRR12345678/
      SRR12345678_1.fastq.gz
      SRR12345678_2.fastq.gz
      manifest.tsv
      metadata.json
      README.md

  GEO/GSE78509__IGF2BP1-H9ES-eCLIP/
      GSM2071742_IGF2BP1_H9ES_Rep1_eCLIP.InputNormalizedPeaks.bed.gz
      manifest.tsv
      metadata.json
      README.md

  _index/
      biotrax_downloads_index.tsv   <- append-only master log of all downloads

Slug rule (deterministic, documented): spaces/slashes replaced with -; consecutive - collapsed; leading/trailing - stripped; truncated at 80 chars. Logical compound names use __ (double underscore) as separator so single underscores in gene names (e.g. QKI_5) survive unchanged.

Unified schema

Dataset(
    source,           # "ENCODE" | "ChIP-Atlas" | "ReMap" | "GEO" | "SRA-ENA"
    dataset_id,       # primary accession (ENCSR000AKW, GSE12345, remap2022_CTCF_hg38)
    target,           # protein/antigen (e.g. "QKI", "CTCF") — None for GEO series
    assay,            # assay type (e.g. "eCLIP", "ChIP-seq", "RNA-seq")
    biosample,        # cell line / tissue (e.g. "K562", "HepG2")
    organism,         # defaults "Homo sapiens"
    genome,           # normalized assembly (e.g. "GRCh38") — default
    n_files,          # count of downloadable files (0 = unknown)
    publication_doi,  # resolved DOI (None if not found — never fabricated)
    pmid,             # PubMed ID
    design_note,      # <=2-sentence human-readable experiment setup
    source_url,       # canonical portal URL
    retrieved,        # ISO date of when BioTrax fetched this record
    extra,            # source-specific dict (never fabricated)
    ignored_filters,  # filter keys this source could not apply (transparency)
)

TrackFile(
    source, dataset_id, file_id,
    target, assay, biosample, genome,
    output_type,      # "peaks" | "signal" | "IDR thresholded peaks" | ...
    file_format,      # "bed" | "narrowPeak" | "bigBed" | "bigWig" | ...
    url,              # direct download URL (HTTPS)
    size,             # bytes (when available)
    md5,              # expected md5 (when available)
    parent,           # back-reference to owning Dataset
)

SeqRun(
    run_accession,    # e.g. SRR12345678
    study,            # SRP / PRJNA accession
    sample,           # SRS / SAMN accession
    library_strategy, # "RNA-Seq" | "ChIP-Seq" | "ATAC-seq" | ...
    library_layout,   # "PAIRED" | "SINGLE"
    instrument,       # e.g. "Illumina NovaSeq 6000"
    read_count,       # total reads
    organism,         # defaults "Homo sapiens"
    fastq_urls,       # ENA HTTPS FASTQ URLs [R1, R2] or [SE]
    fastq_md5,        # parallel md5 list
    parent,           # back-reference to owning Dataset
)

Pipeline exporters

from biotrax.exporters.nfcore_rnaseq import write_nfcore_samplesheet
from biotrax.exporters.reli_index import write_reli_index, write_reli_index_annotated
from biotrax.exporters.worksheet import write_worksheet
from biotrax.exporters.descriptor import write_dataset_readme, write_set_readme

# nf-core/rnaseq samplesheet from SRA runs
# Columns: sample, fastq_1, fastq_2, strandedness
write_nfcore_samplesheet(runs, "samplesheet.csv", strandedness="reverse")

# Minimal RELI binary-compatible index (2-column, tab-delimited, no header)
write_reli_index(track_files, local_paths, "CLIPseq.my_run.index")

# Annotated RELI index with provenance columns
write_reli_index_annotated(track_files, local_paths, "CLIPseq.my_run.index.annotated.tsv")

# Generic tidy sample worksheet (TSV or CSV)
write_worksheet(datasets, track_files=files, out_path="results.tsv")

Flexible filtering

Every source exposes a filters dict for source-specific constraints. Unrecognized keys are reported in Dataset.ignored_filters — never silently dropped.

# ENCODE-specific filters
datasets = adapter.search(
    target="QKI",
    filters={
        "assay_title": "eCLIP",          # exact ENCODE assay_title
        "lab": "Gene Yeo, UCSD",         # lab filter
        "status": "released",            # experiment status
        "date_released": "2020-01-01",   # released on or after
    }
)

# GEO-specific filters
datasets = geo_adapter.search(
    filters={
        "gse": "GSE78509",               # exact accession lookup
        "supplementary_file_type": "BW", # require BigWig supplementary
        "date_from": "2018/01/01",
        "min_samples": 4,
    }
)

# ChIP-Atlas-specific filters
datasets = chipatlas_adapter.search(
    target="CTCF",
    filters={
        "antigen_class": "TFs and others",
        "cell_type_class": "Blood",
        "threshold": "10",               # MACS2 q-value threshold (05/10/20)
    }
)

# SRA/ENA-specific filters
datasets = sra_adapter.search(
    filters={
        "library_strategy": "RNA-Seq",
        "library_layout": "PAIRED",
        "min_read_count": 10000000,
    }
)

Use biotrax list-sources (CLI) or the list_sources() MCP tool to see all filter keys per source.

Contributing: add a source adapter

BioTrax is architected so each source is an isolated, independently testable module. Adding a new source takes four steps:

1. Create biotrax/sources/myadapter.py

from biotrax.sources.base import SourceAdapter
from biotrax.core import Dataset, TrackFile

class MyAdapter(SourceAdapter):
    name = "MySource"
    description = "One-line description of the source."
    supported_filters = ["filter_key_1", "filter_key_2"]
    capabilities = {"search", "list_files", "download"}

    def search(self, target=None, assay=None, biosample=None,
               genome="GRCh38", organism="Homo sapiens",
               filters=None, raw_query=None, limit=50):
        filters = filters or {}
        ignored = self._extract_ignored(filters, self.supported_filters)
        # ... call your API, build Dataset objects ...
        return datasets   # list[Dataset]

    def list_files(self, dataset_or_id, *, output_type=None,
                   file_format=None, genome="GRCh38"):
        acc = self._dataset_id(dataset_or_id)
        # ... fetch file list, build TrackFile objects ...
        return track_files  # list[TrackFile]

2. Set name, description, supported_filters, capabilities

The supported_filters list tells list_sources() (and AI agents) exactly what filter keys this source understands. Any key in filters not in this list is automatically collected into ignored_filters via self._extract_ignored() — the transparency protocol.

3. Register in biotrax/server.py

from biotrax.sources.myadapter import MyAdapter

_PEAK_ADAPTERS["MySource"] = MyAdapter   # or _ALL_ADAPTERS for seq sources

4. Add live-API tests in tests/test_myadapter.py

Use pytest -k myadapter to run only your tests. Follow the pattern in tests/test_base_adapter.py for the ABC contract and tests/test_exporters.py for exporter tests.

Anti-hallucination rule: verify against the live API before merging. If the API can't deliver something cleanly, document the caveat in the adapter docstring — never fabricate results.

Defaults

Genome: GRCh38 (alias "hg38" accepted everywhere)
Organism: Homo sapiens
Download root: ~/BioTrax (override via BIOTRAX_DOWNLOAD_ROOT env var or out_root= argument)
Output type default: peaks (for peak sources)
Strandedness default: "auto" (for nf-core samplesheets)

License

MIT — see LICENSE.

Citation

If BioTrax is useful to your work, please also cite the databases it queries:

ENCODE: ENCODE Project Consortium (2012) Nature 489:57-74
ChIP-Atlas: Oki et al. (2018) NAR [doi:10.1093/nar/gky488]; Oki et al. (2024) NAR [doi:10.1093/nar/gkae358]
ReMap: Hammal et al. (2022) NAR [doi:10.1093/nar/gkab996]
GEO: Barrett et al. (2013) NAR [doi:10.1093/nar/gks1193]
ENA/SRA: Leinonen et al. (2011) NAR [doi:10.1093/nar/gkq967]

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Appeared in Searches

Access GEO Biomedical Datasets

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SpaceSorcerer/BioTrax'

If you have feedback or need assistance with the MCP directory API, please join our Discord server