Skip to main content
Glama

ENCODE Toolkit — Genomics Research Infrastructure for Claude

License: CC BY-NC-ND 4.0 Python 3.10+ Version Status Skills Tools Pipelines Databases Tests Coverage Security Claude Code Provenance PyPI version npm version DOI

Search ENCODE, cross-reference 14 databases, run 7 analysis pipelines, and generate publication-ready methods — all from natural language in Claude Code.

Start from ENCODE but go everywhere: discover histone peaks, cross-reference with GWAS variants, check ClinVar pathogenicity, pull GTEx expression, analyze TF binding motifs from JASPAR, run pipelines, and generate publication-ready methods with full provenance — in one conversation.


Quick Start

Start a new Claude Code session and enter:

/plugin marketplace add ammawla/encode-toolkit

/plugin install encode-toolkit

That's it. All 20 tools, 47 skills, and the MCP connector are now available.

If you only need the 20 MCP tools without the 47 workflow skills:

claude mcp add encode -- uvx encode-toolkit

npx (Node.js)

npx encode-toolkit

Or in MCP client config: { "command": "npx", "args": ["encode-toolkit"] }

pip install

pip install encode-toolkit

Then use encode-toolkit as the command in any MCP client configuration:

{
  "mcpServers": {
    "encode": {
      "command": "encode-toolkit"
    }
  }
}

Add to your claude_desktop_config.json:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "encode": {
      "command": "uvx",
      "args": ["encode-toolkit"]
    }
  }
}

No installation needed when using uvx. Just add the config and restart Claude.

Add to .vscode/mcp.json in your workspace:

{
  "mcp": {
    "servers": {
      "encode": {
        "command": "uvx",
        "args": ["encode-toolkit"]
      }
    }
  }
}

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "encode": {
      "command": "uvx",
      "args": ["encode-toolkit"]
    }
  }
}

Add to .windsurf/mcp.json:

{
  "mcpServers": {
    "encode": {
      "command": "uvx",
      "args": ["encode-toolkit"]
    }
  }
}

Connected Databases

ENCODE Toolkit integrates 14 databases through live API tools and guided skills.

Database

Access Method

Use Case

ENCODE

20 MCP tools (live API)

ChIP-seq, ATAC-seq, RNA-seq, Hi-C, WGBS, CUT&RUN data

GTEx

REST API (skill)

Tissue-specific gene expression across 54 tissues

ClinVar

E-utilities (skill)

Variant clinical significance and pathogenicity

GWAS Catalog

REST API (skill)

Trait-variant associations from genome-wide studies

JASPAR

REST API (skill)

Transcription factor binding motif profiles

CellxGene

Census API (skill)

Single-cell expression atlas across tissues

gnomAD

GraphQL (skill)

Population allele frequencies and gene constraint

Ensembl

REST API (skill)

VEP annotation, Regulatory Build, coordinate liftover

UCSC Genome Browser

REST API (skill)

cCRE tracks, TF clusters, sequence retrieval

GEO

E-utilities (skill)

Complementary expression/epigenomic datasets

PubMed

MCP server

Literature search and citation

bioRxiv

MCP server

Preprint discovery

ClinicalTrials.gov

MCP server

Clinical trial cross-reference

Open Targets

MCP server

Drug target identification


What You Can Ask Claude

Search and explore

  • "Find all histone ChIP-seq experiments for human pancreas tissue"

  • "What ATAC-seq data is available for mouse brain?"

  • "Search for RNA-seq on GM12878 cell line"

  • "What histone marks have ChIP-seq data for pancreas?"

Download and track

  • "Download all BED files from ENCSR133RZO to ~/data/encode"

  • "Track experiment ENCSR133RZO with its publications"

  • "Export citations for my tracked experiments as BibTeX"

Cross-reference databases

  • "What GWAS variants overlap islet enhancers?"

  • "Check ClinVar pathogenicity for rs7903146"

  • "Pull GTEx expression for TCF7L2 across tissues"

  • "Find JASPAR motifs for HNF4A binding sites"

Run pipelines

  • "Set up a ChIP-seq pipeline for my H3K27ac experiments"

  • "Run ATAC-seq analysis with ENCODE-standard QC thresholds"

Generate methods and provenance

  • "Log that I created filtered_peaks.bed from ENCSR133RZO using bedtools"

  • "Generate a methods section for my analysis with citations"

Experiment details

  • "Show me the full details for experiment ENCSR133RZO"

  • "What files are available for ENCSR133RZO?"

  • "List only the BED files from ENCSR133RZO"

Bulk downloads

  • "Download all FASTQs from human pancreas ChIP-seq to /data/fastqs"

  • "Get the IDR thresholded peaks from these experiments"

  • "Download the bigWig signal tracks for H3K27me3 in GRCh38"

Compatibility analysis

  • "Are experiments ENCSR133RZO and ENCSR000AKS compatible for combined analysis?"

  • "Compare these two ChIP-seq experiments"

Provenance chains

  • "Show me the provenance chain for my derived files"

  • "What files have I derived from ENCSR133RZO?"


The Problem

Using genomics databases today means:

  1. Navigate web portals, click through dozens of filters

  2. Manually find the right experiments and files across multiple databases

  3. Write custom scripts to batch download

  4. Lose track of which files came from where

With ENCODE Toolkit, just tell Claude what you need:

"Find all histone ChIP-seq data for human pancreas tissue"

Claude searches ENCODE, returns a structured table of 66 experiments with targets, replicates, and file counts. Downloads are organized by experiment with MD5 verification and full provenance tracking.


Available Tools (20)

Five core tools are shown below. The remaining 15 are collapsed for readability.

encode_search_experiments

Search ENCODE experiments with 20+ filters.

Parameter

Type

Description

assay_title

string

Assay type: "Histone ChIP-seq", "ATAC-seq", "RNA-seq", "Hi-C", etc.

organism

string

Species (default: "Homo sapiens")

organ

string

Organ: "pancreas", "brain", "liver", "heart", "kidney", etc.

biosample_type

string

"tissue", "cell line", "primary cell", "organoid"

target

string

ChIP target: "H3K27me3", "H3K4me3", "CTCF", etc.

biosample_term_name

string

Specific biosample: "GM12878", "HepG2", etc.

limit

int

Max results (default: 25)

encode_get_experiment

Get full details for a single experiment including all files, quality metrics, and audit info.

Parameter

Type

Description

accession

string

Experiment ID (e.g., "ENCSR133RZO")

encode_download_files

Download specific files by accession to a local directory.

Parameter

Type

Description

file_accessions

list[str]

File IDs to download (e.g., ["ENCFF635JIA"])

download_dir

string

Local path to save files

organize_by

string

"flat", "experiment", "format", "experiment_format"

verify_md5

bool

Verify file integrity (default: true)

encode_batch_download

Search + download in one step. Runs in preview mode by default.

Parameter

Type

Description

download_dir

string

Local path to save files

file_format

string

File format to download

assay_title

string

Assay type filter

organ

string

Organ filter

dry_run

bool

Preview only (default: true). Set false to download.

encode_track_experiment

Track an experiment locally with its publications, methods, and pipeline info.

Parameter

Type

Description

accession

string

Experiment ID to track

fetch_publications

bool

Fetch associated publications (default: true)

fetch_pipelines

bool

Fetch pipeline/analysis info (default: true)

notes

string

Optional notes to attach

encode_list_files

List files for a specific experiment with format/type filters.

Parameter

Type

Description

experiment_accession

string

Experiment ID

file_format

string

"fastq", "bam", "bed", "bigWig", "bigBed", etc.

output_type

string

"reads", "peaks", "signal", "alignments", etc.

assembly

string

"GRCh38", "mm10", etc.

preferred_default

bool

Only return recommended files

encode_search_files

Search files across all experiments with combined experiment + file filters.

Parameter

Type

Description

file_format

string

File format filter

assay_title

string

Assay type of parent experiment

organ

string

Organ of parent experiment

target

string

ChIP/CUT&RUN target

output_type

string

Output type filter

assembly

string

Genome assembly

encode_get_metadata

List valid filter values for any parameter.

Parameter

Type

Description

metadata_type

string

"assays", "organisms", "organs", "biosample_types", "file_formats", "output_types", "assemblies"

encode_get_facets

Get live counts from ENCODE showing what data exists for given filters.

Parameter

Type

Description

assay_title

string

Pre-filter by assay

organism

string

Pre-filter by organism

organ

string

Pre-filter by organ

encode_get_file_info

Get detailed metadata for a single file.

Parameter

Type

Description

accession

string

File ID (e.g., "ENCFF635JIA")

encode_manage_credentials

Store, check, or clear ENCODE credentials for restricted data access.

Parameter

Type

Description

action

string

"store", "check", or "clear"

access_key

string

ENCODE access key (for "store")

secret_key

string

ENCODE secret key (for "store")

encode_list_tracked

List all experiments in your local tracker with metadata, publication counts, and derived file counts.

Parameter

Type

Description

assay_title

string

Filter by assay type

organism

string

Filter by organism

organ

string

Filter by organ

encode_get_citations

Get publications for tracked experiments. Export as BibTeX or RIS for reference managers.

Parameter

Type

Description

accession

string

Specific experiment (or all if omitted)

export_format

string

"json" (default), "bibtex", or "ris"

encode_compare_experiments

Analyze whether two experiments are compatible for combined analysis.

Parameter

Type

Description

accession1

string

First experiment ID

accession2

string

Second experiment ID

encode_summarize_collection

Get grouped statistics of your tracked experiment collection.

Parameter

Type

Description

assay_title

string

Filter by assay type

organism

string

Filter by organism

organ

string

Filter by organ

encode_log_derived_file

Log a file you created from ENCODE data for provenance tracking.

Parameter

Type

Description

file_path

string

Path to your derived file

source_accessions

list[str]

ENCODE accessions this was derived from

description

string

What the file contains

tool_used

string

Tool/software used

parameters

string

Command or parameters used

encode_get_provenance

View provenance chains from derived files back to source ENCODE data.

Parameter

Type

Description

file_path

string

Get provenance for a specific file

source_accession

string

List all files derived from an accession

encode_export_data

Export tracked experiments as a table (CSV, TSV, or JSON) for Excel, R, pandas.

Parameter

Type

Description

format

string

"csv" (default), "tsv", or "json"

assay_title

string

Filter by assay type

Link external references (PubMed, bioRxiv, ClinicalTrials, GEO) to tracked experiments.

Parameter

Type

Description

experiment_accession

string

ENCODE experiment accession

reference_type

string

"pmid", "doi", "nct_id", "preprint_doi", "geo_accession", "other"

reference_id

string

The identifier value

encode_get_references

Get external references linked to tracked experiments for cross-server workflows.

Parameter

Type

Description

experiment_accession

string

Filter by experiment (optional)

reference_type

string

Filter by type (optional)


Authentication

Most ENCODE data is public and requires no authentication. Just install and use.

For restricted/unreleased data, ask Claude: "Store my ENCODE credentials"

Credentials are encrypted using your OS keyring (macOS Keychain, Linux Secret Service, Windows Credential Locker) and never stored in plaintext. Get your access keys from your ENCODE profile.


Plugin Skills (47)

When installed as a Claude Code plugin, ENCODE Toolkit includes 47 literature-backed workflow skills that guide Claude through complex genomics tasks. Each analysis skill includes evidence-based quality thresholds, assay-specific metrics, and citations to primary literature.

Core Skills

Skill

Description

setup

Install and configure the ENCODE Toolkit server

search-encode

Search and explore ENCODE experiments and files

download-encode

Download files with organization and verification

track-experiments

Track experiments, citations, and provenance locally

cross-reference

Connect ENCODE data to PubMed, bioRxiv, ClinicalTrials.gov

Skill

Description

quality-assessment

Evaluate experiment quality using ENCODE metrics — assay-specific thresholds for ChIP-seq (FRiP, NSC, RSC, NRF, IDR), ATAC-seq (TSS enrichment, NFR ratio), RNA-seq (mapping rate, gene body coverage), WGBS (bisulfite conversion, CpG coverage), Hi-C (cis/trans ratio), and CUT&RUN/CUT&Tag. Backed by Landt 2012, Buenrostro 2013, ENCODE Phase 3 (2020), Li 2011

integrative-analysis

Combine multiple experiments with batch effect awareness — integration strategies (peak overlap, signal correlation, DiffBind, DESeq2, ChromHMM, ABC model). Backed by Ernst & Kellis 2012, Ross-Innes 2012, Love 2014, Fulco 2019

regulatory-elements

Discover enhancers, promoters, insulators from combinatorial histone marks — ENCODE cCRE classification (926,535 elements), ChromHMM state interpretation. Backed by ENCODE Phase 3 (2020), Roadmap Epigenomics (2015), Whyte 2013

epigenome-profiling

Build comprehensive chromatin state profiles — three-tiered histone panels, ChromHMM 15-state model, bivalent chromatin analysis. References the chromatin biology catalog

compare-biosamples

Compare experiments across tissues and cell types — biosample hierarchy, tissue-specific regulation, batch effect detection. Backed by Roadmap Epigenomics (2015), Leek 2010

visualization-workflow

Generate publication-quality visualizations: genome browser tracks, heatmaps, and signal profiles

motif-analysis

Discover and analyze TF binding motifs in regulatory regions using HOMER, MEME, and JASPAR

peak-annotation

Annotate genomic peaks with features (promoter/enhancer/intergenic), nearest genes, and functional categories

batch-analysis

Batch processing and QC screening across multiple ENCODE experiments with systematic quality filtering

Skill

Description

functional-screen-analysis

Analyze CRISPR screens, MPRA, and STARR-seq data from ENCODE — MAGeCK, BAGEL2, MPRAflow integration

Skill

Description

histone-aggregation

Union merge of histone ChIP-seq peaks across studies — signalValue-based noise filtering, sample-of-origin tagging, ENCODE blacklist removal. Backed by ChIP-Atlas (Oki 2018), Amemiya 2019, Perna 2024

accessibility-aggregation

Union merge of ATAC-seq and DNase-seq peaks — cross-platform integration, peak summit preservation. Backed by Corces 2017, Amemiya 2019, Zhao 2020

hic-aggregation

Union catalog of Hi-C chromatin loops (BEDPE) — resolution-aware anchor matching, loop caller concordance tracking. Backed by Loop Catalog (Reyna 2025), Mustache (Roayaei Ardakany 2020)

methylation-aggregation

Aggregate WGBS methylation profiles — per-CpG weighted averaging, HMR/UMR/PMD identification. Backed by Schultz 2015, DMRcate (Peters 2021), Zhou 2020

Skill

Description

scrna-meta-analysis

Cross-study meta-analysis of scRNA-seq data — reproducibility assessment, TIN-based quality filtering, ambient RNA quantification. Backed by Tran 2020, Luecken & Theis 2019, Stuart 2019, Korsunsky 2019

multi-omics-integration

Integrate RNA-seq, ATAC-seq, Histone ChIP-seq, and TF ChIP-seq — ABC model regulatory predictions, signal correlation. Backed by Fulco 2019, Corces 2018, ENCODE Phase 3 (2020)

Skill

Description

data-provenance

Full reproducibility tracking — tool versions, reference files, scripts, exact commands, timestamps, source-to-derived provenance chains

cite-encode

Generate proper citations, BibTeX/RIS export, data availability statements

variant-annotation

Annotate GWAS/disease variants with ENCODE functional data — variant-to-gene mapping via cCREs. Backed by Finucane 2015, Maurano 2012

pipeline-guide

Understand ENCODE uniform analysis pipelines and output types — pipeline specifications, Nextflow integration

single-cell-encode

Work with scRNA-seq and scATAC-seq data — platform comparison, cross-study integration, WNN multimodal analysis. Backed by Hao 2021, Stuart 2019

disease-research

Disease-focused workflows — GWAS variant interpretation, disease-tissue mapping, heritability enrichment, drug target identification via Open Targets. Backed by Buniello 2019, Finucane 2015

publication-trust

Publication integrity assessment — 5-level trust scoring, retraction/erratum detection, citation analysis. Integrates with PubMed, bioRxiv, and Consensus

bioinformatics-installer

Install all bioinformatics tools for ENCODE analyses — 7 conda environment YAMLs, 3 install scripts, 134+ tools across ChIP-seq, ATAC-seq, RNA-seq, WGBS, Hi-C, DNase-seq, CUT&RUN

scientific-writing

Generate publication-ready methods sections, figure legends, supplementary tables, and data availability statements with full tool citations

liftover-coordinates

Convert genomic coordinates between assembly versions (hg19/hg38, mm9/mm10) using UCSC liftOver, CrossMap, Ensembl REST API, and rtracklayer

Skill

Description

gtex-expression

Query GTEx tissue expression data via REST API for gene expression context across 54 tissues

clinvar-annotation

Annotate variants with ClinVar clinical significance, pathogenicity, and review status

cellxgene-context

Query CellxGene single-cell atlas for cell type expression context across tissues

gwas-catalog

Search NHGRI-EBI GWAS Catalog for trait associations, risk alleles, and study metadata

jaspar-motifs

Query JASPAR database for transcription factor binding motifs and matrix profiles

ensembl-annotation

Ensembl VEP variant annotation, Regulatory Build, coordinate liftover, gene lookup via REST API

geo-connector

Search NCBI GEO for complementary datasets, cross-reference with ENCODE, FTP downloads

gnomad-variants

gnomAD population allele frequencies, gene constraint (LOEUF/pLI), structural variants via GraphQL

ucsc-browser

UCSC Genome Browser REST API for cCRE tracks, TF binding clusters, and sequence retrieval

Pipeline

Assay

Aligner

Caller

pipeline-chipseq

ChIP-seq

BWA-MEM

MACS2 + IDR

pipeline-atacseq

ATAC-seq

Bowtie2

MACS2 (Tn5-adjusted)

pipeline-rnaseq

RNA-seq

STAR

RSEM + Kallisto

pipeline-wgbs

WGBS

Bismark

MethylDackel

pipeline-hic

Hi-C

BWA

Juicer + HiCCUPS

pipeline-dnaseseq

DNase-seq

BWA

Hotspot2

pipeline-cutandrun

CUT&RUN

Bowtie2

SEACR

Each pipeline includes a SKILL.md overview, 5-stage reference files (preprocessing through QC), a complete Nextflow DSL2 pipeline, a Dockerfile, and deployment configurations for local, SLURM, GCP, and AWS.

File

Description

skills/histone-aggregation/references/histone-marks-reference.md

Comprehensive chromatin biology catalog (1,442 lines) — 21 histone marks with writers/erasers/readers, 5 novel acylation marks, ChromHMM state models (5 to 51 states), TF co-binding patterns, chromatin remodeling complexes, DNA methylation-chromatin interplay, nucleosome dynamics, 3D genome organization, chromatin in disease. 74 primary references

skills/*/references/literature.md

33 per-skill literature reference documents — ~250 papers cataloged with DOI, PMID, citation counts, and skill-relevant key findings


Why ENCODE Toolkit

Most genomics tools give you one thing. ENCODE Toolkit gives you the full research loop:

Capability

ENCODE Toolkit

Typical MCP servers

Live database access

20 tools across 14 databases

Single database, read-only

Executable pipelines

7 Nextflow DSL2 pipelines with Docker and cloud configs

None

Provenance tracking

Full audit trail from source data to derived files

None

Publication output

BibTeX/RIS citations, auto-generated methods sections

None

Literature backing

100+ primary references with assay-specific QC thresholds

None

Workflow skills

47 guided skills covering search to publication

Static documentation


Supported Assay Types

Category

Assays

Histone/Chromatin

Histone ChIP-seq, TF ChIP-seq, ATAC-seq, DNase-seq, CUT&RUN, CUT&Tag, MNase-seq

Transcription

RNA-seq, total RNA-seq, small RNA-seq, long read RNA-seq, CAGE, RAMPAGE, PRO-seq, GRO-seq

3D Genome

Hi-C, intact Hi-C, Micro-C, ChIA-PET, HiChIP, PLAC-seq, 5C

DNA Methylation

WGBS, RRBS, MeDIP-seq, MRE-seq

Functional

STARR-seq, MPRA, CRISPR screen, eCLIP, iCLIP

Single Cell

scRNA-seq, snATAC-seq, 10x multiome, SHARE-seq, Parse SPLiT-seq

Perturbation

CRISPRi + RNA-seq, shRNA + RNA-seq, siRNA + RNA-seq

Supported file formats: fastq bam bed bigWig bigBed tsv csv hic tagAlign bedpe pairs fasta vcf tar


Security and Privacy

  • 100% local execution — no telemetry, no analytics, no tracking

  • Credentials encrypted at rest via OS keyring with Fernet fallback

  • Certificate verification enforced — no verify=False

  • Rate limited to respect ENCODE's 10 req/sec policy

  • MD5 verification on all downloads by default

  • No data leaves your machine except queries to public APIs over HTTPS


Vignettes

Step-by-step walkthroughs showing real Claude sessions, including actual API output and scientific interpretation.

Vignette

Skills Demonstrated

01 — Discovery & Search

Facets, search, metadata, quality-aware selection

02 — Download & Track

File listing, download, tracking, citations, provenance

03 — Epigenomics Workflow

Histone marks, ATAC-seq, aggregation skills

04 — Variant & Disease Research

GWAS catalog, ClinVar, GTEx, JASPAR, gnomAD

05 — Expression & Single-Cell

RNA-seq, scRNA-seq, GTEx, CellxGene, meta-analysis

06 — Motif & Regulatory Analysis

TF ChIP-seq, chromatin states, HOMER/MEME

07 — 3D Genome & Methylation

Hi-C loops, WGBS methylation, integrative analysis

08 — Pipeline Execution

ChIP-seq/ATAC-seq/RNA-seq pipelines, Nextflow

09 — Cross-Reference & Integration

GEO, PubMed, Ensembl, UCSC, multi-omics

Every skill has a dedicated vignette in docs/skill-vignettes/ with a complete example session. Highlights:

Skill

Vignette Scenario

data-provenance

Download, blacklist-filter, liftover, auto-generate methods section

histone-aggregation

Union merge of H3K27ac across 5 pancreas experiments

variant-annotation

rs7903146 in TCF7L2 with islet enhancer evidence scoring

pipeline-chipseq

Full Nextflow pipeline execution with ENCODE QC thresholds

gwas-catalog

T2D GWAS variants overlaid on islet H3K27ac enhancers

publication-trust

Trust assessment of artemisinin transdifferentiation claim

scrna-meta-analysis

3-study islet integration following Mawla et al. 2019 framework

See the full showcase for 15 detailed examples.


Development

git clone https://github.com/ammawla/encode-toolkit.git
cd encode-toolkit
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Run the server locally:

encode-toolkit

Run tests:

pytest

Troubleshooting

  • Make sure you restarted Claude Desktop after adding the config

  • Verify uvx is installed: pip install uv or curl -LsSf https://astral.sh/uv/install.sh | sh

  • Check your internet connection

  • ENCODE API rate limit is 10 requests/sec — the server handles this automatically

  • The file may require authentication. Ask Claude: "Store my ENCODE credentials"

  • Or check if the file status is "released" on encodeproject.org

  • Try broader filters (remove biosample_type or organ)

  • Use encode_get_facets to see what data actually exists for your filters

  • Use encode_get_metadata to check valid filter values


Author

Dr. Alex M. Mawla, PhD

License

Restrictive Non-Commercial License. Free for personal, educational, and academic research. No derivative works without written permission. Commercial use requires a separate license. See LICENSE for full terms.

For commercial licensing inquiries: ammawla@ucdavis.edu

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ammawla/encode-toolkit'

If you have feedback or need assistance with the MCP directory API, please join our Discord server