Schema | BioPython MCP Server

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`NCBI_EMAIL`	No	Email address for NCBI Entrez queries (recommended)
`NCBI_API_KEY`	No	API key for higher NCBI rate limits (optional)

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{ "tasks": { "list": {}, "cancel": {}, "requests": { "tools": { "call": {} }, "prompts": { "get": {} }, "resources": { "read": {} } } } }

Tools

Functions exposed to the LLM to take actions

Name	Description
translate_sequence	Translate a DNA or RNA sequence to protein. Args: sequence: DNA or RNA sequence string table: Genetic code table to use (default: 1 for standard code) to_stop: Stop translation at first stop codon (default: False) Returns: Dictionary containing the translated protein sequence and metadata
reverse_complement	Get the reverse complement of a DNA sequence. Args: sequence: DNA sequence string Returns: Dictionary containing the reverse complement and metadata
transcribe_dna	Transcribe DNA to RNA (or reverse transcribe RNA to DNA). Args: sequence: DNA or RNA sequence string reverse: If True, reverse transcribe RNA to DNA (default: False) Returns: Dictionary containing the transcribed sequence and metadata
calculate_gc_content	Calculate the GC content of a DNA or RNA sequence. Args: sequence: DNA or RNA sequence string Returns: Dictionary containing GC content percentage and counts
find_motif	Find all occurrences of a motif in a sequence. Args: sequence: DNA, RNA, or protein sequence to search motif: Motif pattern to find overlapping: Allow overlapping matches (default: True) Returns: Dictionary containing motif positions and count
pairwise_align	Perform pairwise sequence alignment. Args: seq1: First sequence seq2: Second sequence mode: Alignment mode - 'global' or 'local' (default: 'global') match_score: Score for matching residues (default: 2.0) mismatch_score: Score for mismatching residues (default: -1.0) gap_open: Gap opening penalty (default: -2.0) gap_extend: Gap extension penalty (default: -0.5) Returns: Dictionary containing alignment results and statistics
multiple_sequence_alignment	Perform multiple sequence alignment. Args: sequences: List of sequences to align algorithm: Alignment algorithm to use (default: 'clustalw') Returns: Dictionary containing alignment results Note: This is a placeholder that demonstrates the structure. Full implementation would require external tools like MUSCLE or Clustal Omega.
calculate_alignment_score	Calculate the score of a given alignment using a substitution matrix. Args: alignment_str: Aligned sequences (with gaps) as a formatted string matrix_name: Name of substitution matrix to use (default: 'BLOSUM62') Returns: Dictionary containing alignment score and statistics
fetch_genbank	Fetch a sequence from GenBank by accession number. Args: accession: GenBank accession number email: Email address for Entrez (required by NCBI) rettype: Return type - 'gb' for GenBank, 'fasta' for FASTA (default: 'gb') Returns: Dictionary containing the sequence record and metadata
fetch_uniprot	Fetch a protein sequence from UniProt. Args: uniprot_id: UniProt accession or ID format: Output format - 'fasta', 'txt', 'xml' (default: 'fasta') Returns: Dictionary containing the UniProt record
search_pubmed	Search PubMed for scientific articles. Args: query: Search query string max_results: Maximum number of results to return (default: 10) email: Email address for Entrez (required by NCBI) Returns: Dictionary containing search results with PMIDs and article information
fetch_sequence_by_id	Fetch a sequence from NCBI database by ID. Args: db: Database name ('nucleotide', 'protein', etc.) seq_id: Sequence identifier email: Email address for Entrez (required by NCBI) Returns: Dictionary containing sequence information
entrez_info	Get information about NCBI Entrez databases. Args: database: Specific database name (empty string for list of all databases) Returns: Dictionary containing database information: - If database="": List of all available databases with count - If database specified: Detailed info including description, record count, searchable fields, and available links Examples: >>> entrez_info() # List all databases >>> entrez_info("pubmed") # Get PubMed database details >>> entrez_info("gene") # Get Gene database details
entrez_search	Search any NCBI Entrez database using query syntax. Args: database: Database to search (e.g., 'pubmed', 'nucleotide', 'gene', 'clinvar') query: Search query using Entrez syntax (see module docstring for examples) max_results: Maximum number of results to return (default: 20, max: 10000) sort: Sort order - 'relevance', 'pub_date', 'Author', etc. (default: 'relevance') use_cache: Whether to use cached results (default: True, TTL: 1 hour) Returns: Dictionary containing: - ids: List of matching record IDs - count: Number of IDs returned - total_found: Total number of matches in database - query: Original query string - database: Database searched - cached: Whether result was from cache (if use_cache=True) Examples: >>> entrez_search("pubmed", "BRCA1 AND breast cancer", max_results=10) >>> entrez_search("gene", "BRCA1[Gene Name] AND Homo sapiens[Organism]") >>> entrez_search("nucleotide", "Homo sapiens[Organism]", max_results=5) >>> entrez_search("clinvar", "BRCA1[Gene] AND Pathogenic[Clinical Significance]") Notes: - Uses NCBI Entrez query syntax with field tags and Boolean operators - Rate limited to 3 req/sec (or 10 req/sec with API key) - See module docstring for comprehensive query syntax examples - Cached results have 1 hour TTL to balance freshness and API usage
entrez_fetch	Fetch full records from NCBI Entrez by UID. Args: database: Database name (e.g., 'pubmed', 'nucleotide', 'gene', 'protein') ids: Single ID, comma-separated string, or list of IDs rettype: Return type - 'xml', 'gb', 'fasta', 'abstract', etc. (default: 'xml') retmode: Return mode - 'xml', 'text', 'json' (default: 'xml') use_cache: Whether to use cached results (default: True, TTL: 7 days) Returns: Dictionary containing: - data: Raw data in requested format (parsed if XML, raw text otherwise) - ids: List of IDs fetched - count: Number of records retrieved - format: Return type/mode used - database: Database queried - cached: Whether result was from cache (if use_cache=True) Examples: >>> entrez_fetch("pubmed", "12345678", rettype="abstract", retmode="xml") >>> entrez_fetch("nucleotide", ["NM_000207", "NM_001127"], rettype="fasta", retmode="text") >>> entrez_fetch("gene", "672", rettype="xml") >>> entrez_fetch("protein", "NP_000198.1", rettype="fasta", retmode="text") Notes: - For >100 IDs, consider batching to avoid timeouts - Valid rettype/retmode combinations depend on database - XML mode returns parsed Python dict/list structure - Text mode returns raw string data - Rate limited to 3 req/sec (or 10 req/sec with API key) - Cached results have 7 day TTL since record data is relatively static
entrez_summary	Get document summaries (DocSums) from NCBI Entrez. Document summaries are lightweight alternatives to full records, containing key metadata without the full content. Much faster for metadata-only queries. Args: database: Database name (e.g., 'pubmed', 'gene', 'clinvar', 'nucleotide') ids: Single ID, comma-separated string, or list of IDs use_cache: Whether to use cached results (default: True, TTL: 7 days) Returns: Dictionary containing: - summaries: List of document summary dictionaries - ids: List of IDs requested - count: Number of summaries returned - database: Database queried - cached: Whether result was from cache (if use_cache=True) Examples: >>> entrez_summary("pubmed", "12345678") >>> entrez_summary("gene", ["672", "7157"]) # BRCA1, TP53 >>> entrez_summary("clinvar", "12345") >>> entrez_summary("nucleotide", "NM_000207,NM_001127") Notes: - Much faster than entrez_fetch for metadata-only queries - Fields returned vary by database type - Rate limited to 3 req/sec (or 10 req/sec with API key) - Use this instead of fetch when you don't need full sequence/text - Cached results have 7 day TTL since summary data is relatively static
clinvar_variant_lookup	Search ClinVar for genetic variants and their clinical interpretations. This specialized wrapper combines entrez_search and entrez_summary for convenient ClinVar queries. Args: variant: Variant notation (e.g., "rs80357906", "NM_000059.3:c.1521_1523del") gene: Gene symbol (e.g., "BRCA1", "TP53") condition: Condition/phenotype (e.g., "breast cancer", "Lynch syndrome") significance: Clinical significance filter: - "pathogenic" - "likely_pathogenic" - "benign" - "likely_benign" - "uncertain" max_results: Maximum results to return (default: 20) use_cache: Whether to use cached results (default: True) Returns: Dictionary containing: - variants: List of variant dictionaries with clinical information - count: Number of variants returned - total_found: Total matches in ClinVar - query_terms: Dictionary of search terms used - cached: Whether result was from cache (if use_cache=True) Examples: >>> clinvar_variant_lookup(gene="BRCA1", significance="pathogenic", max_results=5) >>> clinvar_variant_lookup(variant="rs80357906") >>> clinvar_variant_lookup(gene="TP53", condition="cancer", max_results=10) Notes: - At least one search parameter must be provided - Multiple parameters are combined with AND logic - Rate limited (3 req/sec or 10 req/sec with API key) - Cached results inherit TTL from underlying entrez_search and entrez_summary calls
gene_info_fetch	Fetch comprehensive gene information from NCBI Gene database. This specialized wrapper provides easy access to gene records with structured output. Args: gene_symbol: Gene symbol (e.g., "BRCA1", "TP53") gene_id: NCBI Gene ID (e.g., "672" for BRCA1) organism: Organism name (default: "Homo sapiens") use_cache: Whether to use cached results (default: True) Returns: Dictionary containing: - gene_id: NCBI Gene ID - symbol: Official gene symbol - name: Full gene name - summary: Gene summary/description - organism: Organism name - chromosome: Chromosomal location - aliases: List of gene aliases - type: Gene type (protein-coding, ncRNA, etc.) - cached: Whether result was from cache (if use_cache=True) Examples: >>> gene_info_fetch(gene_symbol="BRCA1") >>> gene_info_fetch(gene_id="672") >>> gene_info_fetch(gene_symbol="Brca1", organism="Mus musculus") Notes: - Provide either gene_symbol or gene_id (gene_id takes precedence) - Organism filter helps disambiguate gene symbols - Rate limited (3 req/sec or 10 req/sec with API key) - Cached results inherit TTL from underlying entrez_search and entrez_summary calls
pubmed_search	Search PubMed with enhanced metadata extraction. This specialized wrapper provides enriched PubMed search results with structured article metadata. Args: query: PubMed search query (supports all Entrez query syntax) max_results: Maximum results to return (default: 10) sort: Sort order - "relevance", "pub_date", "first_author" (default: "relevance") year_start: Filter by publication year start (e.g., 2020) year_end: Filter by publication year end (e.g., 2024) use_cache: Whether to use cached results (default: True, TTL: 1 hour) Returns: Dictionary containing: - articles: List of article dictionaries with: - pmid: PubMed ID - title: Article title - abstract: Full abstract text - authors: List of author names - journal: Journal name - year: Publication year - date: Publication date - doi: DOI (if available) - pmc_id: PMC ID (if available) - count: Number of articles returned - total_found: Total matches in PubMed - cached: Whether result was from cache (if use_cache=True) Examples: >>> pubmed_search("BRCA1 AND breast cancer", max_results=5) >>> pubmed_search("Smith J[Author]", sort="pub_date") >>> pubmed_search("diabetes", year_start=2020, year_end=2024, max_results=20) Notes: - Uses comprehensive Entrez query syntax - Returns full abstracts when available - Rate limited (3 req/sec or 10 req/sec with API key) - Cached results have 1 hour TTL to balance freshness and API usage
variant_literature_link	Find literature (PubMed) articles linked to a specific variant. Uses Entrez ELink to find cross-database relationships between variant databases and PubMed. Args: variant_id: Variant ID (ClinVar ID or dbSNP rs number) source_db: Source database - "clinvar" or "snp" (default: "clinvar") max_results: Maximum articles to return (default: 10) Returns: Dictionary containing: - variant_id: Input variant ID - source_db: Source database used - linked_pmids: List of linked PubMed IDs - articles: List of article summaries - count: Number of articles found Examples: >>> variant_literature_link("12345", source_db="clinvar") >>> variant_literature_link("80357906", source_db="snp", max_results=5) Notes: - Not all variants have linked literature - Uses Entrez ELink for database cross-referencing - Rate limited (3 req/sec or 10 req/sec with API key)
entrez_link	Find related records across NCBI databases using ELink. This tool discovers relationships between records in different databases, such as finding PubMed articles related to genes, or nucleotide sequences related to proteins. Args: source_db: Source database (e.g., 'gene', 'protein', 'clinvar') target_db: Target database to link to (e.g., 'pubmed', 'nucleotide') ids: Single ID, comma-separated string, or list of IDs from source_db link_name: Specific link type (optional, empty = all available links) Returns: Dictionary containing: - source_db: Source database name - target_db: Target database name - source_ids: List of source IDs queried - linked_ids: Dict mapping source IDs to lists of linked target IDs - total_links: Total number of links found - link_name: Link type used (if specified) Examples: >>> entrez_link("gene", "pubmed", "672") # BRCA1 gene to PubMed >>> entrez_link("protein", "nucleotide", ["NP_000198.1", "NP_001121"]) >>> entrez_link("clinvar", "pubmed", "12345", link_name="clinvar_pubmed") Notes: - Discovers cross-database relationships automatically - Use entrez_info() to see available link names for databases - Rate limited (3 req/sec or 10 req/sec with API key) - Different databases support different link types
clear_entrez_cache	Clear cached Entrez results. The caching system stores Entrez query results to reduce API calls and improve response times. Use this tool to clear stale cache data. Args: database: Database name to clear (empty string clears all databases) Returns: Dictionary containing: - success: Whether operation succeeded - cleared: Number of cache files removed - database: Database cleared (or "all" if empty string) - cache_location: Path to cache directory Examples: >>> clear_entrez_cache() # Clear all caches >>> clear_entrez_cache("pubmed") # Clear only PubMed cache >>> clear_entrez_cache("gene") # Clear only Gene cache Notes: - Caching is optional and controlled via use_cache parameter - Default TTL: 1 hour for searches, 7 days for fetches - Cache stored in ~/.biopython-mcp/cache/ - Cached data includes search results and summaries
pubmed_fetch	Fetch full-text article from PubMed Central (PMC). This function retrieves open access full-text articles from PMC using the PMC OAI service. Only works for open access articles that have a PMC ID. Args: pmc_id: PMC identifier (with or without 'PMC' prefix, e.g., "PMC123456" or "123456") format: Output format - "xml" for structured XML or "text" for plain text (default: "xml") timeout: Request timeout in seconds (default: 30) Returns: Dictionary containing the full-text article and metadata: - success (bool): Whether fetch was successful - pmc_id (str): The PMC identifier - format (str): Format of returned content - content (str): Full-text article content - content_length (int): Length of content in characters - error (str): Error message if unsuccessful Examples: >>> result = pubmed_fetch("PMC3539452") >>> if result["success"]: ... print(result["content"][:100]) >>> result = pubmed_fetch("3539452", format="text") >>> print(result["content"]) Note: - Only works for open access articles - Articles without PMC IDs cannot be fetched - Rate limiting applies (use with entrez_rate_limit context manager) - XML format preserves structure (sections, figures, tables, references) - Text format provides simplified plain text extraction
get_pmc_url	Get the URL for a PubMed Central article. Args: pmc_id: PMC identifier (with or without 'PMC' prefix) Returns: Full URL to PMC article page Examples: >>> get_pmc_url("PMC3539452") 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3539452/' >>> get_pmc_url("3539452") 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3539452/'
get_doi_url	Get the URL for a DOI. Args: doi: Digital Object Identifier Returns: Full URL to DOI resolver Examples: >>> get_doi_url("10.1371/journal.pone.0012345") 'https://doi.org/10.1371/journal.pone.0012345'
pubmed_review	Create a formatted literature review from PubMed search results and write to MD file. This function searches PubMed, fetches article metadata, formats it as markdown with complete abstracts, and writes the content directly to a file. The LLM determines both the storage location and filename. Args: query: PubMed search query (supports full Entrez syntax including year filters) Example: "BRCA1 AND breast cancer AND 2020:2024[PDAT]" path: Relative directory path within vault (e.g., "research/cancer" or "genetics/reviews") The LLM decides the directory structure. filename: Name of the markdown file (e.g., "brca1_review_2024.md" or "alport_syndrome.md") The LLM decides the filename. Should include .md extension. obsidian_vault: Path to Obsidian vault root (optional, defaults to OBSIDIAN_VAULT_PATH env variable) Example: "/Users/user/Documents/MyVault" max_results: Maximum number of articles to include (default: 25, max: 1000) sort: Sort order - "pub_date", "relevance", etc. (default: "pub_date") Returns: Dictionary with review results and metadata: - status: "success" or "error" - filepath: Full path where file was written - articles_found: Total number of articles found - articles_written: Number of articles successfully processed - articles_with_pmc: Count of articles with PMC IDs - articles_with_doi: Count of articles with DOIs - query: Original search query - file_size_kb: File size in kilobytes - year_range: {"min": int, "max": int} - top_journals: List of top 5 journals by article count - execution_time_seconds: Time taken to generate review Examples: >>> # Using environment variable for vault path >>> result = pubmed_review( ... query="COL4A3[Gene] AND Alport syndrome", ... path="genetics/reviews", ... filename="alport_syndrome_2024.md" ... ) >>> # Overriding vault path >>> result = pubmed_review( ... query="BRCA1 AND breast cancer AND 2020:2024[PDAT]", ... path="oncology/brca1", ... filename="literature_review_jan2024.md", ... obsidian_vault="/Users/user/Vault", ... max_results=50 ... ) Notes: - Vault path comes from OBSIDIAN_VAULT_PATH environment variable or obsidian_vault parameter - LLM controls both directory structure (path) and filename separately - Writes markdown file directly to disk with complete abstracts - Fetches articles in batches of 20 (NCBI limit) - Respects NCBI rate limits (3/sec or 10/sec with API key) - For very large reviews (>500 articles), consider splitting into multiple calls - Includes Obsidian-compatible YAML frontmatter
fetch_pdb_structure	Fetch a protein structure from the PDB database. Args: pdb_id: PDB identifier (e.g., '1ABC') file_format: File format - 'pdb' or 'cif' (default: 'pdb') Returns: Dictionary containing structure information and file location
calculate_structure_stats	Calculate statistics for a PDB structure file. Args: pdb_file: Path to PDB file Returns: Dictionary containing structure statistics
find_active_site	Extract information about specific residues (e.g., active site). Args: pdb_file: Path to PDB file residue_numbers: List of residue numbers to analyze chain_id: Chain identifier (default: 'A') Returns: Dictionary containing active site residue information
build_phylogenetic_tree	Build a phylogenetic tree from sequences. Args: sequences: List of aligned sequences method: Tree building method - 'nj' (neighbor-joining) or 'upgma' (default: 'nj') labels: Optional labels for sequences Returns: Dictionary containing tree information
calculate_distance_matrix	Calculate pairwise distance matrix for sequences. Args: sequences: List of aligned sequences model: Distance model to use (default: 'identity') labels: Optional labels for sequences Returns: Dictionary containing distance matrix
draw_tree	Draw a phylogenetic tree from Newick format. Args: tree_newick: Tree in Newick format output_format: Output format - 'ascii' for text representation (default: 'ascii') Returns: Dictionary containing tree visualization

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

BioPython MCP Server

Server Configuration

Capabilities

Tools

Prompts

Resources

Latest Blog Posts

MCP directory API