Skip to main content
Glama
ehsansyh
by ehsansyh

SciAgentKit


What is SciAgentKit?

SciAgentKit is a local-first toolkit that gives AI coding agents real scientific tools instead of letting them invent molecular properties with the confidence of a reviewer who did not read the supplement.

It combines:

  • Deterministic scientific skills for molecule-library audit, descriptor profiling, scaffold diversity, protein/PDB ranking, ligand preparation, docking planning, MD job generation, trajectory analysis, and report writing.

  • MCP tool server so Claude Code, Codex, Gemini CLI, Cursor, and other agent runtimes can call the same local scientific tools.

  • Agent skills that teach the model when to call each tool, what to ask the user, what to refuse to invent, and where to stop when scientific inputs are missing.

  • Reproducibility layer with run_manifest.json, input/output hashes, structured CSV files, reports, and methods text.

The philosophy is simple:

Skills decide the workflow. MCP runs the science. Reports preserve the evidence.


Related MCP server: BioContextAI Knowledgebase MCP

Why this exists

AI agents are getting very good at editing files and calling tools. They are still terrible at knowing when a docking score is not biology, when scaffold novelty matters more than SMILES novelty, or when a missing cofactor turns a structure workflow into decorative nonsense.

SciAgentKit is built for researchers working on:

  • computational biology

  • AI drug discovery

  • molecule generation

  • QSAR / DTI workflows

  • docking and MD pipelines

  • protein target selection

  • scientific figure and report generation

It is not another chatbot. It is a scientific skill layer for agents.


Workflow overview

Stage

What it does

Main outputs

Literature search

Search literature around target, mutation, binding site, assay context

literature_results.csv

Protein selection

Fetch UniProt sequence, apply mutation if requested, rank PDB chains by mutation match, coverage/span, resolution

target_sequence.fasta, ranked_structures.csv

Ligand preparation

Canonicalize SMILES, remove duplicates, generate 3D SDF at pH, recommend or accept force field

ligands_3d.sdf, cleaned_ligands.csv

Pocket detection

Use crystal ligand center, known residues, or external pocket tools

pocket.json

Docking

Generate Vina configs, run docking when receptor/ligand PDBQT inputs exist, rank top ligands

docking_scores.csv, top_ligands.csv

MD planning

Build OpenMM job folders with neutralization, 0.15 M salt, NVT/NPT/equilibration, seeded replicas

md_jobs.csv, run_openmm_md.py

Trajectory analysis

RMSD, RMSF, ProLIF residue interaction frequencies when trajectories are available

rmsd.csv, rmsf.csv, prolif_interactions.csv

Reporting

Human-readable and machine-readable outputs

report.md, final_report.docx, final_report.pdf, run_manifest.json


What it can analyze

Molecule-library audit

For generated molecules, RL molecules, screening libraries, or known actives:

  • SMILES validation

  • canonicalization

  • internal duplicate removal

  • cross-duplicate detection against reference/training libraries

  • novelty fraction

  • scaffold novelty

  • Bemis-Murcko scaffold diversity

  • QED, MW, logP, TPSA, HBD, HBA, rotatable bonds

  • property-distribution figures

  • report + methods section + reproducibility manifest

Protein/PDB selection

For target-aware workflows:

  • UniProt target sequence retrieval

  • mutation application to target sequence

  • PDB cross-reference discovery

  • chain-sequence alignment

  • mutation-aware structure filtering

  • ranking by residue coverage/span and resolution

  • selected PDB/chain summary with limitations

Docking and MD workflow support

For structure-based screening:

  • ligand 3D SDF generation

  • pH-aware preparation when Open Babel is available

  • force-field recommendation or user-specified force-field path

  • pocket center extraction from crystal ligand or known residues

  • AutoDock Vina wrapper and score parsing

  • OpenMM job skeletons for top ligands

  • replica planning with different seeds

  • MDAnalysis RMSD/RMSF

  • ProLIF interaction frequency tables


Install

git clone https://github.com/ehsansyh/sciagentkit.git
cd sciagentkit
bash scripts/install.sh --all

This installs the full Python stack, attempts external scientific tools through conda-forge, generates agent configuration files, registers the Claude Code plugin, runs sciagent doctor, runs tests, and executes the bundled demo. External CLI tools are attempted one by one; if AutoDock Vina or fpocket cannot be installed automatically, see docs/EXTERNAL_TOOLS.md for direct installation links.

Core-only install

For molecule audit and descriptor/scaffold analysis only:

bash scripts/install.sh --core --agents

Docker

docker compose up --build sciagentkit

For MCP server mode:

docker compose up --build sciagentkit-mcp

Quick demo

source scripts/activate.sh
sciagent doctor
sciagent demo

Expected demo output includes:

runs/install_demo/
├── query/
├── reference/
├── novelty_report.csv
├── descriptor_comparison.csv
├── scaffold_comparison.csv
├── figures/
├── report.md
└── run_manifest.json

Use from the CLI

Analyze one SMILES library

sciagent analyze-smiles examples/egfr_demo/ligands.smi \
  --out runs/egfr_profile

Audit generated molecules against known actives

sciagent audit-generated \
  examples/hiv_demo/generated_molecules.smi \
  examples/hiv_demo/known_actives.smi \
  --out runs/hiv_generated_audit

Select best PDB structure for a mutated target

sciagent protein-select EGFR \
  --mutation L858R \
  --out runs/egfr_protein

Prepare ligands at pH 7.4

sciagent prepare-ligands examples/egfr_demo/ligands.smi \
  --ph 7.4 \
  --target-family kinase \
  --out runs/egfr_ligands

Create a target-screening project skeleton

sciagent target-screen-project EGFR examples/egfr_demo/ligands.smi \
  --mutation L858R \
  --ph 7.4 \
  --target-family kinase \
  --out runs/egfr_project

Use with AI agents

SciAgentKit is designed to be used as:

Agent skill / workflow instruction  +  local MCP scientific tool server

The agent selects the stage. The MCP server computes the values. This prevents the model from inventing descriptors, docking scores, protein coverage, RMSD, or interaction frequencies. This keeps scientific values tied to deterministic tool outputs instead of model estimates.

One-command setup for any agent

Fresh GitHub checkout:

git clone https://github.com/ehsansyh/sciagentkit.git
cd sciagentkit
bash scripts/install.sh --all

For an existing checkout, point SciAgentKit at your agent of choice:

sciagent init-agent claude     # CLAUDE.md + project .mcp.json
sciagent init-agent codex      # AGENTS.md + codex_config.fragment.toml + .codex/skills/
sciagent init-agent gemini     # GEMINI.md + .gemini/settings.json + .gemini/skills/
sciagent init-agent cursor     # .cursor rule + .mcp/sciagentkit.json
sciagent init-agent all        # everything above

Every tool uses the same cross-platform launcher (scripts/start_mcp.py, no bash required) and the same MCP server, so behavior is aligned across Claude Code, Codex, Gemini CLI, and Cursor.

Skill source of truth. All skill definitions live in .claude/skills/. The plugin copy and any .gemini/skills/ are generated from it — never edit them directly. After changing a skill, run sciagent sync-skills to propagate the change everywhere.


Claude Code

Claude Code has the best experience because SciAgentKit ships as a plugin with namespaced skills and a bundled MCP launcher.

Install:

bash scripts/install.sh --full --agents --external --claude-plugin

Open Claude Code from the repository root:

claude

Inside Claude Code:

/reload-plugins
/plugin list
/mcp

Use the stage skills:

/sciagentkit:smiles-analysis Analyze examples/egfr_demo/ligands.smi and save outputs to runs/claude_egfr_profile.
/sciagentkit:generated-molecule-audit Audit examples/hiv_demo/generated_molecules.smi against examples/hiv_demo/known_actives.smi and save outputs to runs/claude_hiv_audit.
/sciagentkit:protein-structure-selection Select the best PDB structure for EGFR L858R using UniProt, mutation-aware alignment, residue coverage, and resolution ranking. Save outputs to runs/egfr_protein.
/sciagentkit:full-target-screening Run a full target-screening project for EGFR L858R using examples/egfr_demo/ligands.smi at pH 7.4. Stop before docking or MD if required external inputs are missing. Save outputs to runs/egfr_full.

Available Claude Code skills:

/sciagentkit:smiles-analysis
/sciagentkit:generated-molecule-audit
/sciagentkit:literature-search
/sciagentkit:protein-structure-selection
/sciagentkit:ligand-preparation
/sciagentkit:pocket-and-docking
/sciagentkit:md-workflow
/sciagentkit:trajectory-analysis
/sciagentkit:report
/sciagentkit:full-target-screening

OpenAI Codex

Codex can use SciAgentKit through MCP + Agent Skills + AGENTS.md.

Option A: ask Codex to install it

Open Codex in any directory and ask:

Clone https://github.com/ehsansyh/sciagentkit, run the installer, configure SciAgentKit MCP and skills for Codex, verify with codex mcp list, run sciagent doctor, and run the demo.

After installing SciAgentKit (pip install -e . or bash scripts/install.sh), generate Codex config from a single source:

sciagent init-agent codex

This writes AGENTS.md (operating manual), .codex/skills/, and codex_config.fragment.toml with a project-local cross-platform MCP entry. Add the fragment to your Codex config if your Codex surface does not auto-read project MCP fragments.

Option C: configure manually

git clone https://github.com/ehsansyh/sciagentkit.git
cd sciagentkit
bash scripts/install.sh --full --agents --external

Add the MCP server:

codex mcp add sciagentkit \
  --env PYTHONPATH=$(pwd)/src \
  -- python scripts/start_mcp.py

The launcher scripts/start_mcp.py is cross-platform (Windows/macOS/Linux) and needs no bash. On Windows, use python scripts\start_mcp.py.

Verify inside Codex:

/mcp

Use natural prompts:

Use the SciAgentKit generated-molecule-audit skill to audit examples/hiv_demo/generated_molecules.smi against examples/hiv_demo/known_actives.smi. Save outputs to runs/codex_hiv_audit.
Use SciAgentKit to select the best PDB structure for EGFR L858R. Explain mutation match, sequence coverage, residue span, resolution, and limitations.

Codex should treat AGENTS.md as the project-level operating manual and the MCP server as the only source for scientific calculations.


Gemini CLI

Gemini CLI can use SciAgentKit through MCP + Agent Skills + GEMINI.md / skill folders.

After installing SciAgentKit, run:

sciagent init-agent gemini

This writes GEMINI.md, adds the sciagentkit MCP server to .gemini/settings.json (merging into any existing file), and copies all skills into .gemini/skills/ from the canonical source. Then just open gemini.

Manual setup

Install SciAgentKit:

git clone https://github.com/ehsansyh/sciagentkit.git
cd sciagentkit
bash scripts/install.sh --full --agents --external

Add SciAgentKit to ~/.gemini/settings.json or project-level .gemini/settings.json:

{
  "mcpServers": {
    "sciagentkit": {
      "command": "python",
      "args": ["/ABSOLUTE/PATH/TO/sciagentkit/scripts/start_mcp.py"],
      "cwd": "/ABSOLUTE/PATH/TO/sciagentkit",
      "timeout": 30000,
      "trust": true
    }
  }
}

Copy skills from the canonical source (.claude/skills/ holds the real SKILL.md files):

mkdir -p .gemini/skills
cp -r .claude/skills/* .gemini/skills/

Open Gemini CLI:

gemini

Verify:

/mcp list
/skills

Use prompts:

Use SciAgentKit generated-molecule-audit to audit examples/hiv_demo/generated_molecules.smi against examples/hiv_demo/known_actives.smi. Save outputs to runs/gemini_hiv_audit.
Use SciAgentKit full-target-screening for EGFR L858R using examples/egfr_demo/ligands.smi at pH 7.4. Ask whether to use the recommended force field or a user-specified force field before ligand preparation.

MCP tools exposed by SciAgentKit

When the MCP server is active, agents can call:

canonicalize
descriptors
bemis_murcko_scaffold
analyze_library
compare_libraries
audit_generated_library
literature_search
protein_select
prepare_ligands
detect_pocket
run_docking
md_plan
analyze_trajectory
target_screen_project
write_report

Start manually:

python -m sciagentkit.servers.rdkit_server

or, cross-platform (Windows/macOS/Linux, no bash required):

python scripts/start_mcp.py

Security model

SciAgentKit is local-first, but local-first still requires explicit boundaries for filesystem writes, external tools, and high-cost workflows.

Safety principles

  • No invented science: agents must call MCP tools for molecular descriptors, UniProt/PDB ranking, docking results, RMSD/RMSF, and interaction frequencies.

  • No blind shell execution: scientific workflows should go through SciAgentKit CLI/MCP tools, not arbitrary model-written shell commands.

  • Runs stay under runs/: outputs are expected to be written into controlled run directories.

  • Manifest required: each workflow writes run_manifest.json with command, parameters, hashes, environment information, and output files.

  • External binaries are explicit: OpenBabel, AutoDock Vina, fpocket, OpenMM, MDAnalysis, and ProLIF are checked and reported by sciagent doctor.

  • No secret harvesting: skills must not ask for API keys unless the user explicitly configures a relevant optional service.

  • Human approval for high-impact actions: docking/MD execution, large compute jobs, and external downloads should remain user-visible and interruptible.

Workflow

Suggested approval level

SMILES audit / descriptors / figures

Low to medium

Literature search

Medium, because network calls and citations matter

Protein/PDB selection

Medium

Ligand preparation

Medium

Docking execution

High

MD execution

High

Report generation from existing outputs

Low

Trust checklist before enabling a third-party skill

Read SKILL.md
Check scripts/ for shell commands
Check MCP config for external servers
Run sciagent doctor
Use a disposable test folder first
Do not expose private compound libraries to untrusted remote tools
Keep Claude/Codex/Gemini approval prompts enabled for heavy workflows

Scientific limitations

SciAgentKit is a workflow scaffold and tool layer. It does not replace expert review for protein preparation, docking validation, force-field parameterization, or MD interpretation.

Current limitations:

  • receptor protonation needs expert review

  • missing loops, cofactors, metals, and crystal waters may require manual handling

  • docking scores are not biological activity

  • ligand parameterization must be validated before serious MD

  • 1–5 ns MD is a screening-level sanity check, not production evidence

  • MM/GBSA and free-energy workflows are not yet production-grade in this repository

  • membrane proteins, covalent inhibitors, metalloenzymes, and unusual cofactors need custom preparation


Example outputs

A generated-molecule audit produces:

runs/hiv_generated_audit/
├── query/
│   ├── canonicalization_report.csv
│   ├── cleaned_molecules.csv
│   ├── descriptors.csv
│   ├── descriptor_summary.csv
│   ├── scaffolds.csv
│   └── scaffold_summary.csv
├── reference/
├── novelty_report.csv
├── cross_duplicates.csv
├── novel_query_molecules.csv
├── descriptor_comparison.csv
├── scaffold_comparison.csv
├── figures/
│   ├── qed_comparison.png
│   ├── mw_comparison.png
│   ├── logp_comparison.png
│   └── tpsa_comparison.png
├── report.md
└── run_manifest.json

A protein-selection run produces:

runs/egfr_protein/
├── uniprot_target.csv
├── target_sequence.fasta
├── pdb_candidates.csv
├── ranked_structures.csv
└── run_manifest.json

A docking/MD project can produce:

runs/egfr_project/
├── literature/
├── protein/
├── ligands/
├── pocket/
├── docking/
├── md/
├── analysis/
├── final_report.md
├── final_report.docx
├── final_report.pdf
└── run_manifest.json

Example figures generated by SciAgentKit

These examples are produced by the bundled demo workflows and show the kind of plots the toolkit writes automatically under each run directory.


Repository layout

sciagentkit/
├── src/sciagentkit/
│   ├── core/              # deterministic scientific utilities
│   ├── skills/            # workflow functions
│   ├── servers/           # MCP server
│   ├── agents/            # reference agent prompts/plans
│   └── plotting/          # figures
├── plugins/
│   └── claude-code-sciagentkit/
├── skills/
│   ├── claude/
│   ├── codex/
│   ├── cursor/
│   └── openclaw/
├── examples/
│   ├── hiv_demo/
│   └── egfr_demo/
├── scripts/
│   ├── install.sh
│   ├── start_mcp.sh
│   └── setup_agents.py
├── docs/
├── tests/
└── runs/

Roadmap

v1.1

  • stronger Codex plugin packaging

  • Gemini CLI skill installer

  • better receptor preparation checks

  • PAINS/Brenk filters

  • SA score integration

  • richer report templates

v1.2

  • Meeko / PDBQT preparation helpers

  • improved force-field recommendation database

  • docking pose clustering

  • interaction heatmaps

  • publication-ready figure themes

v2.0

  • hosted/private SciAgentKit Cloud

  • team workspaces

  • secure MCP gateway

  • private compound-library audit

  • managed docking/MD compute queues

  • pharma-style audit logs and report templates


Citation

If you use SciAgentKit in a project, cite the repository:

@software{sciagentkit,
  title = {SciAgentKit: MCP-native scientific skills for reproducible computational biology and drug-discovery agents},
  author = {Ehsan Sayyah},
  year = {2026},
  url = {https://github.com/ehsansyh/sciagentkit}
}

License

Apache-2.0.


Final warning from the tiny guardian of reproducibility

If an agent claims it docked a ligand, ran MD, calculated RMSD, and found a drug without producing structured outputs and a manifest, it did not do science. It performed theater with a GPU costume.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ehsansyh/Sciagentkit'

If you have feedback or need assistance with the MCP directory API, please join our Discord server