Skip to main content
Glama
Narasimhan-Lab

tacc-mcp-bio

tacc-mcp-bio

Give Claude Code a live connection to your TACC (or any SLURM) HPC cluster — check jobs, read logs, browse results, run analyses, and submit new jobs, all from natural language without opening a terminal.

Built for bioinformatics pipelines: RNA-seq, GWAS, variant calling, genome assembly, single-cell, metagenomics — anything running on a SLURM cluster.


Table of Contents

  1. What this does

  2. How it works

  3. Prerequisites

  4. Installation

  5. Daily usage

  6. Tool reference

  7. Analysis co-pilot loop

  8. Example conversations

  9. Configuration reference

  10. Troubleshooting

  11. Security model

  12. Adapting to other HPC clusters

  13. Adding new tools


1. What this does

The problem

When you run bioinformatics pipelines on a TACC cluster, you constantly context-switch:

ssh tacc → authenticate (password + 2FA) → squeue → find log file →
tail the error → look up what it means → fix script → push from Mac →
pull on TACC → resubmit → wait → repeat

Every step is a context-switch away from the science.

The solution

This MCP server runs locally on your Mac and gives Claude Code 25 tools that talk directly to your HPC cluster. Claude can:

  • Check the SLURM queue in real time

  • Read log files and diagnose errors autonomously

  • Browse results directories and verify output files

  • Run quick R/Python/bash snippets on the cluster without a SLURM job

  • Extract top GWAS hits and interpret them inline

  • Fetch result files to your Mac so Claude can read and plot them locally

  • Write and submit new analysis jobs based on what it just read

The result is conversations like:

You:    "Did the RNA-seq alignment finish? Any samples with low alignment rate?"

Claude: [job_history()]           → STAR_align 3210174 COMPLETED
        [check_outputs(path=".../results/star/")]  → 48/48 bam files present
        [run_remote_script(flagstat_summary.py)]   → 2 samples below 70%
        → "Samples ERR123456 and ERR123457 have alignment rates of 61% and 58%.
           The others are 85-95%. Want me to check their FastQC reports?"

No terminal. No copy-paste. No context switch.


2. How it works

Architecture

Claude Code (MCP client)
    │
    │  JSON-RPC over stdio
    ▼
server.py  (MCP server, runs locally on your Mac)
    │
    │  ssh -o BatchMode=yes <alias> "command"
    │  (reuses SSH ControlMaster — no re-authentication)
    ▼
HPC login node  (TACC Lonestar6, Frontera, or any SLURM cluster)
    │
    │  squeue, sbatch, sacct, cat, find, python3, Rscript...
    ▼
Your project files, SLURM queue, results

SSH ControlMaster

The server never stores your password or 2FA tokens. It relies on SSH's ControlMaster: when you run ssh tacc in your terminal, SSH opens an authenticated connection and holds it open as a background socket. Every subsequent ssh tacc "command" reuses that socket silently — no re-authentication, near-instant connection (typically <1 second per tool call).

ControlPersist 4h keeps the socket alive for 4 hours after your last use, even if you close the terminal.

Lazy config loading

~/.tacc_mcp.json is read on the first tool call, not at import time. This means the server starts successfully even when the config is missing, SSH is down, or the cluster is unreachable. Without lazy loading, any of these conditions would crash the server before it registered any tools — Claude Code would silently show no tools with no error message.


3. Prerequisites

Requirement

How to check

Mac or Linux (local machine)

Python 3.10+

python3 --version

TACC or other SLURM HPC account

SSH access to the cluster

ssh <your-cluster>

Claude Code CLI

claude --version

Supported clusters

  • TACC Lonestar6 ✓ (primary target)

  • TACC Frontera ✓ (change hostname in setup)

  • TACC Stampede3

  • Any SLURM cluster with squeue / sbatch / sacct

  • PBS/Torque ✗ (would need qstat/qsub variants)


4. Installation

Step 1: Clone this repo

git clone https://github.com/Devanshpandey/tacc-mcp-bio.git
cd tacc-mcp-bio

Clone it somewhere permanent on your Mac — the MCP registration points to this path and will break if you move the folder.

Good locations: ~/tools/tacc-mcp-bio/ or ~/code/tacc-mcp-bio/

Step 2: Run the setup wizard

python3 setup.py

The wizard prompts for:

Prompt

Example

HPC hostname

ls6.tacc.utexas.edu

Your HPC username

jsmith

SSH alias

tacc

Project path on cluster

/work/12345/jsmith/my-project

Logs subfolder

logs/runs

It writes two things:

~/.tacc_mcp.json — the server config:

{
  "hpc_host":    "tacc",
  "hpc_user":    "jsmith",
  "project_dir": "/work/12345/jsmith/my-project",
  "logs_subdir": "logs/runs"
}

~/.ssh/config — SSH ControlMaster block:

Host tacc
    HostName ls6.tacc.utexas.edu
    User jsmith
    ControlMaster auto
    ControlPath ~/.ssh/cm_%r@%h:%p
    ControlPersist 4h

The wizard also creates .venv/ inside the repo with Python 3.12 and the mcp package.

Step 3: Register with Claude Code

# Run from your analysis project directory
cd /path/to/your/analysis-project

claude mcp add tacc-hpc \
  /path/to/tacc-mcp-bio/.venv/bin/python \
  /path/to/tacc-mcp-bio/server.py

Verify:

claude mcp list
# tacc-hpc: /path/to/... — ✓ Connected

Step 4: Open a daily SSH session

ssh tacc
# TACC password: ••••••••
# TACC Token:    123456
# (Done. You can close this terminal.)

The ControlMaster socket stays alive for 4 hours. You don't need to keep the terminal open.

Step 5: Verify in Claude Code

Open Claude Code in your project directory and ask:

"Call get_started() on the TACC MCP"

Expected:

✓  Config found
✓  SSH ControlMaster is live

5. Daily usage

Your daily routine

# Once per morning (30 seconds):
ssh tacc

# Then everything else from Claude Code

What changes

Before

After

ssh taccsqueue → copy job ID

"Check my jobs"

ssh tacctail -100 logs/step3.err

"Show me the last 100 lines of the alignment log"

scp tacc:/work/.../multiqc.html ~/Desktop/

"Fetch the MultiQC report and summarise it"

Write script → scp → sbatch → wait → scp plot

"Plot coverage depth across samples"

Grep 48 log files for errors

"Any errors in the alignment logs?"

When the SSH session expires

Tools return:

SSH ControlMaster is not active.
Fix: open a terminal and run:  ssh tacc

Just run ssh tacc again — 30 seconds — and continue.


6. Tool reference

Connection & Setup

get_started()

Interactive tutorial. Checks your config and SSH status, prints a complete tool reference and workflow guide. Call this first if anything seems wrong.

check_connection()

Verifies the SSH ControlMaster is live. Returns hostname, username, and login node load average.


SLURM Job Management

job_status()

Shows all pending and running jobs for your user (squeue), with state, elapsed time, and wait reason.

Common wait reasons:

  • Priority — behind higher-priority jobs in queue

  • Dependency — waiting for another job to finish

  • Resources — waiting for free nodes

  • QOSMaxJobsPerUserLimit — hit your concurrent job limit

job_history(days=3)

Shows recently completed, failed, or cancelled jobs (sacct). Use this when a job disappears from the queue and you need to know what happened.

job_details(job_id)

Full SLURM accounting for one job: working directory, log paths, allocated CPUs, peak memory (MaxRSS), timing.

cancel_jobs(job_ids)

Runs scancel on one or more job IDs (space-separated). Works on individual jobs and array jobs.


Pipeline Management

pipeline_status(run_id="")

Cross-references a manifest.tsv (written by your pipeline at submission) with the live queue and sacct history. Shows per-step status. Defaults to the most recent run.

manifest.tsv format (write this from your run_all.sh):

step	script	job_id
01_qc	scripts/01_qc.sh	3210174
02_align	scripts/02_align.sh	3210175_[1-48]
03_quant	scripts/03_quant.sh	3210180

list_runs()

Lists all pipeline run directories under logs/runs/, newest first, with file count and summary line.

run_pipeline(script="run_all.sh", args="")

Submits a pipeline script as a background nohup process on the login node. Returns PID and log path.

Use for lightweight orchestration scripts only. Heavy compute goes through submit_analysis().

check_outputs(path="", pattern="", paths="")

Checks whether expected output files or directories exist. Three modes:

# Check a single path
check_outputs(path="/work/.../results/sample1.bam")

# List files matching a pattern under a directory
check_outputs(path="/work/.../results/star/", pattern="*.bam")

# Check a list of specific paths
check_outputs(paths="/work/.../s1.bam, /work/.../s1.bai, /work/.../s2.bam")

Files & Logs

list_dir(path="")

ls -lhp on any directory. Defaults to project root. Results capped at 100 lines.

read_file(path, head_lines=0, tail_lines=0, max_lines=200)

Read any text file (logs, TSVs, scripts, configs). Supports head, tail, or full read with line cap.

list_logs(run_id="")

Lists log files for a pipeline run, sorted by modification time. Empty log files (size 0) indicate jobs that didn't run.

read_log(step="", path="", run_id="", tail_lines=100, stream="both")

Tails a SLURM log file. Specify by step name prefix (e.g. "align", "quant") or full path. For array jobs, returns the last 5 matching log files automatically.

read_log(step="align")
read_log(step="variant_call", stream="err")
read_log(path="/work/.../logs/runs/20260606/qc_3210178.out")

grep_file(pattern, path, context_lines=2)

grep -n -C N on a remote file or glob. Useful for finding errors across multiple log files.


System & Git

disk_usage()

Shows quota and filesystem usage for $HOME, $WORK, $SCRATCH, and project results//data/ directories. Run before starting a large analysis.

node_load()

Current login node CPU and memory (uptime, free -h, top). Check this before running a login-node script.

git_status()

git status --short and last 5 commits in the project directory on the cluster.

git_pull()

git pull origin <branch> on the cluster. Run this after pushing code from your Mac.

run_command(command, timeout=120)

Run any shell command on the HPC login node. Escape hatch for anything not covered by the other tools.

run_command("module avail 2>&1 | grep -i samtools")
run_command("conda env list")
run_command("find /work/.../data -name '*.fastq.gz' | wc -l")

Analysis Co-Pilot

fetch_file(remote_path, local_path="")

Copies a single file from the cluster to ~/Downloads/hpc_results/ (or a specified path) via scp. After fetching, Claude can read the file directly.

fetch_results(remote_dir="", pattern="*.csv,*.tsv,*.txt,*.html,*.png,*.pdf,*.json", local_dir="", dry_run=False, max_size_mb=50.0)

Batch-fetches analysis result files from a remote directory. Pulls only files matching the pattern and under the size limit. Skips large raw data files automatically.

# Fetch all CSVs and HTML reports from results/
fetch_results(remote_dir="/work/.../results/")

# Preview what would be fetched
fetch_results(remote_dir="/work/.../results/qc/", dry_run=True)

# Custom pattern
fetch_results(remote_dir="/work/.../results/", pattern="*.html,*.png")

submit_analysis(script, job_name, time, memory, cpus, partition, account, email)

Writes a script to the cluster and submits it via sbatch. Claude composes the script; this tool handles SLURM headers and log paths. Returns job ID and log paths.

submit_analysis(
    script="""#!/bin/bash
module load samtools
samtools sort -@ 8 /work/.../raw/sample1.bam -o /work/.../results/sample1.sorted.bam
samtools index /work/.../results/sample1.sorted.bam
""",
    job_name="sort_bam",
    time="1:00:00",
    memory="16G",
    cpus=8,
)

run_remote_script(script, interpreter="bash", timeout=120)

Runs a short script on the login node and returns output inline. No SLURM job needed. Use for data summaries, format checks, quick R/Python snippets.

Supported interpreters: bash, python3, Rscript, awk, perl.

run_remote_script("""
import pandas as pd
df = pd.read_csv('/work/.../results/deseq2/results.tsv', sep='\\t')
print(df[df.padj < 0.05].shape[0], 'significant genes')
print(df.sort_values('padj').head(10).to_string())
""", interpreter="python3")

gwas_top_hits(sumstats_file, format="auto", n=50, pval_threshold=5e-8, pval_col="")

Extracts top GWAS hits from a summary statistics file. Auto-detects file format from the header. Supports REGENIE, PLINK, SAIGE, METAL, and generic TSV formats.

# Auto-detect format
gwas_top_hits("/work/.../results/gwas/trait1.regenie.gz")

# Specify format
gwas_top_hits("/work/.../results/plink/trait2.assoc.logistic", format="plink")

# Custom p-value column
gwas_top_hits("/work/.../results/custom.tsv", pval_col="P_BOLT_LMM")

7. Analysis co-pilot loop

The five co-pilot tools enable a self-contained research loop:

1. MONITOR     job_status() / check_outputs()
      │
      ▼
2. EXTRACT     run_remote_script()  — lightweight on login node
      │        gwas_top_hits()
      ▼
3. FETCH       fetch_results() / fetch_file()
      │        (pulls files to Mac so Claude can read them)
      ▼
4. INTERPRET   Claude reads CSVs/tables/HTML locally
      │        → biological / statistical interpretation
      │        → identifies what to do next
      ▼
5. ACT         submit_analysis(Claude-written script)
      │        → new job on HPC
      │
      └──────────────── loop back to 1

Example: RNA-seq QC → Filtering → Alignment

Step 1  check_outputs(path="/work/.../results/fastqc/", pattern="*_fastqc.html")
        → 48/48 FastQC reports present

Step 2  run_remote_script("multiqc /work/.../results/fastqc/ -o /work/.../results/multiqc/")
        → MultiQC report generated

Step 3  fetch_file("/work/.../results/multiqc/multiqc_report.html")
        → Claude reads the HTML report

Step 4  Claude: "3 samples have >30% adapter content and 2 have low Q30 bases.
                 I recommend trimming all samples. Should I submit a Trimmomatic job
                 for the 3 outliers first as a test?"

Step 5  submit_analysis(trimmomatic_script, job_name="trim_test", time="2:00:00")
        → job 3210512 submitted

8. Example conversations

Status check

You:    "Quick status — jobs, outputs, disk"

Claude: [job_status()]              → 12 jobs running (STAR align array)
        [check_outputs(path=".../results/star/", pattern="*.bam")]
                                    → 31/48 BAMs present so far
        [disk_usage()]              → WORK 62% used, results/ 180GB
        → "STAR alignment is running, 31/48 samples complete. At this rate
           it will finish in ~2 hours. You have 38% of WORK quota remaining
           — should be fine for this batch."

Debugging a failed job

You:    "The variant calling failed — what went wrong?"

Claude: [job_history(days=2)]
        → GATK_HaplotypeCaller 3210190_[1-24] FAILED (exit code 1)
        [job_details("3210190")]
        → log path: /work/.../logs/runs/20260607/gatk_3210190_3.err
        [read_log(step="gatk", stream="err")]
        → "ERROR: Input file /work/.../results/bam/sample3.sorted.bam
           does not have a sequence dictionary"
        → "The BAM for sample3 is missing a sequence dictionary.
           This happens when samtools sort runs without the reference.
           Want me to fix the sort command and resubmit just sample3?"

Interpreting results

You:    "DESeq2 finished — fetch the results and tell me the top hits"

Claude: [fetch_results(remote_dir="/work/.../results/deseq2/")]
        → pulls results.tsv, MA_plot.png, heatmap.png
        [reads results.tsv locally]
        → "1,847 genes are significant (padj < 0.05). Top upregulated:
           IFITM3, ISG15, MX1 — classic interferon-stimulated genes.
           Top downregulated: HMGCS1, SQLE, FDFT1 — cholesterol synthesis.
           This pattern is consistent with an antiviral immune response
           suppressing lipid metabolism. Want me to run pathway enrichment?"

Submitting a follow-up analysis

You:    "Run GSEA on those DESeq2 results using the Hallmark gene sets"

Claude: → writes a complete fgsea R script using the DESeq2 ranked gene list
        [submit_analysis(gsea_script, job_name="gsea_hallmarks", time="1:00:00", memory="16G")]
        → "Job 3210545 submitted. It will run fgsea with the MSigDB Hallmark
           gene sets against your DESeq2 ranked list. Log at logs/claude_jobs/
           gsea_hallmarks_3210545.out. Should take ~10 minutes."

9. Configuration reference

~/.tacc_mcp.json

{
  "hpc_host":    "tacc",
  "hpc_user":    "jsmith",
  "project_dir": "/work/12345/jsmith/my-project",
  "logs_subdir": "logs/runs"
}

Key

Required

Default

Description

hpc_host

SSH alias from ~/.ssh/config. Used in every ssh <host> "cmd" call.

hpc_user

Your HPC username. Used in squeue -u and sacct -u.

project_dir

Absolute path to your project root on the cluster.

logs_subdir

logs/runs

Subfolder under project_dir where pipeline run log directories live.

Backwards compatibility: tacc_host and tacc_user are also accepted as aliases for hpc_host and hpc_user.

~/.ssh/config block

Host tacc
    HostName ls6.tacc.utexas.edu
    User jsmith
    ControlMaster auto
    ControlPath ~/.ssh/cm_%r@%h:%p
    ControlPersist 4h

To extend the session window, change ControlPersist 4h to 8h or 12h.


10. Troubleshooting

Tools don't appear in Claude Code

Cause A: Wrong config file. Claude Code reads MCP servers from ~/.claude.json (registered per-project via claude mcp add). Manual edits to settings.json files may not work in all environments.

Fix:

cd /your/project
claude mcp add tacc-hpc /path/to/.venv/bin/python /path/to/server.py
claude mcp list

Cause B: Server crashes at startup. Test directly:

/path/to/tacc-mcp-bio/.venv/bin/python /path/to/tacc-mcp-bio/server.py
# Should start silently, waiting on stdin. Ctrl+C to stop.

Cause C: Wrong Python path. The command must point to .venv/bin/python inside the tacc-mcp-bio directory.


"SSH ControlMaster is not active"

Your 4-hour session expired.

ssh tacc   # re-authenticate

# Check if ControlMaster is active:
ssh -O check tacc
# "Master running (pid=XXXXX)" → active

"No config found at ~/.tacc_mcp.json"

python3 /path/to/tacc-mcp-bio/setup.py

Or create it manually:

cat > ~/.tacc_mcp.json << 'EOF'
{
  "hpc_host":    "tacc",
  "hpc_user":    "YOUR_USERNAME",
  "project_dir": "/work/XXXXX/YOUR_USERNAME/your-project",
  "logs_subdir": "logs/runs"
}
EOF

run_remote_script fails with Python/module errors

The system Python on login nodes may lack your packages. Specify your conda environment's interpreter:

# Find your conda environments on the cluster:
run_command("conda env list")

# Then use the full path:
run_remote_script(script, interpreter="/work/.../anaconda3/envs/myenv/bin/python3")

claude mcp list shows ✗ Error

Run the full MCP handshake manually to see the actual error:

printf '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1"}}}\n{"jsonrpc":"2.0","method":"notifications/initialized","params":{}}\n{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}\n' | \
  /path/to/tacc-mcp-bio/.venv/bin/python \
  /path/to/tacc-mcp-bio/server.py 2>&1

Should return JSON with "tools": [...].


fetch_file / fetch_results times out

scp uses the same ControlMaster socket. Verify it's active:

ssh -O check tacc

For large files (>500 MB), use run_command to check the file size first, then decide whether to fetch or analyse directly on the cluster.


11. Security model

The server CAN

The server CANNOT

Run any command you ask on HPC login nodes

Access other users' files

Read/write files in your HPC project

Submit jobs to queues you lack access to

Submit SLURM jobs under your account

Authenticate to HPC (you do this manually)

Copy files to/from your local machine

Store your password or MFA tokens anywhere

Credential handling

  • Your password and MFA token never touch this software. You enter them directly into ssh.

  • The ControlMaster socket (~/.ssh/cm_<user>@<host>:<port>) is owned by your user, mode 600.

  • ~/.tacc_mcp.json contains only your username and project path — no secrets.

The run_command and submit_analysis tools

These can execute arbitrary code on the HPC cluster under your account. Claude Code shows you the tool call before executing — you see exactly what will run and can decline. The tools run with your standard HPC permissions and cannot escalate privileges.

Data governance

fetch_file and fetch_results copy files to your local machine (~/Downloads/hpc_results/ by default). Files are subject to your data access agreements and institutional data governance policies. If you are working with controlled-access data (e.g. dbGaP, UK Biobank), ensure local storage complies with your data use agreement before fetching.


12. Adapting to other HPC clusters

The server uses only POSIX-standard tools: ssh, scp, rsync, squeue, sbatch, sacct, scancel. It works on any SLURM cluster.

Minimal changes for a different cluster

  1. Update ~/.tacc_mcp.json — point hpc_host to your cluster's SSH alias

  2. Update setup.py — change the default hostname

  3. That's it for basic functionality

Per-cluster notes

Cluster

Notes

TACC Lonestar6

Default target. normal partition, $WORK storage

TACC Frontera

Change hostname to frontera.tacc.utexas.edu. Use development/normal partitions

TACC Stampede3

Change hostname to stampede3.tacc.utexas.edu

Non-TACC SLURM

Update hostname, username, and partition in submit_analysis() calls


13. Adding new tools

@mcp.tool()
def my_tool(arg1: str, arg2: int = 10) -> str:
    """
    One-sentence summary for Claude to know when to call this tool.

    arg1: description of this argument
    arg2: description with default (default: 10)
    """
    cfg, err = _require_cfg()
    if err:
        return err

    r = _ssh(f"some_command {arg1} | head -{arg2}")
    return _fmt(r)

Restart Claude Code after adding tools — no re-registration needed.


Contributing

PRs welcome. The server is a single self-contained file (~600 lines) — easy to extend.

Useful additions not yet implemented:

  • watch_job(job_id) — poll until a job reaches a terminal state

  • transfer_progress() — rsync with live progress for large directories

  • environment_check() — verify conda/module environment on the cluster

  • PBS/Torque support — qstat/qsub variants of the SLURM tools

  • Multi-cluster support — multiple HPC systems in one config


License

MIT

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Narasimhan-Lab/tacc-mcp-bio'

If you have feedback or need assistance with the MCP directory API, please join our Discord server