Skip to main content
Glama

get_data

Retrieve biological data from Biomart by specifying attributes and filters, returning results in CSV format. Use this tool to query datasets efficiently and apply custom filters for targeted data extraction.

Instructions

Queries Biomart for data using specified attributes and filters.

This function performs the main data retrieval from Biomart, allowing you to
query biological data by specifying which attributes to return and which filters
to apply. Includes automatic retry logic for resilience.

Args:
    mart (str): The mart identifier (e.g., "ENSEMBL_MART_ENSEMBL")
    dataset (str): The dataset identifier (e.g., "hsapiens_gene_ensembl")
    attributes (list[str]): List of attributes to retrieve (e.g., ["ensembl_gene_id", "external_gene_name"])
    filters (dict[str, str]): Dictionary of filters to apply (e.g., {"chromosome_name": "1"})

Returns:
    str: CSV-formatted results of the query.

Example:
    get_data(
        "ENSEMBL_MART_ENSEMBL",
        "hsapiens_gene_ensembl",
        ["ensembl_gene_id", "external_gene_name", "chromosome_name"],
        {"chromosome_name": "X", "biotype": "protein_coding"}
    )
    >>> "ensembl_gene_id,external_gene_name,chromosome_name
         ENSG00000000003,TSPAN6,X
         ENSG00000000005,TNMD,X
         ..."

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
attributesYes
datasetYes
filtersYes
martYes

Implementation Reference

  • The get_data tool handler: decorated with @mcp.tool() for MCP registration. Executes Biomart query with retry logic using pybiomart, returns CSV data.
    @mcp.tool()
    def get_data(mart: str, dataset: str, attributes: list[str], filters: dict[str, str]):
        """
        Queries Biomart for data using specified attributes and filters.
    
        This function performs the main data retrieval from Biomart, allowing you to
        query biological data by specifying which attributes to return and which filters
        to apply. Includes automatic retry logic for resilience.
    
        Args:
            mart (str): The mart identifier (e.g., "ENSEMBL_MART_ENSEMBL")
            dataset (str): The dataset identifier (e.g., "hsapiens_gene_ensembl")
            attributes (list[str]): List of attributes to retrieve (e.g., ["ensembl_gene_id", "external_gene_name"])
            filters (dict[str, str]): Dictionary of filters to apply (e.g., {"chromosome_name": "1"})
    
        Returns:
            str: CSV-formatted results of the query.
    
        Example:
            get_data(
                "ENSEMBL_MART_ENSEMBL",
                "hsapiens_gene_ensembl",
                ["ensembl_gene_id", "external_gene_name", "chromosome_name"],
                {"chromosome_name": "X", "biotype": "protein_coding"}
            )
            >>> "ensembl_gene_id,external_gene_name,chromosome_name
                 ENSG00000000003,TSPAN6,X
                 ENSG00000000005,TNMD,X
                 ..."
        """
        for attempt in range(MAX_RETRIES):
            try:
                server = get_server()
                return (
                    server[mart][dataset]
                    .query(attributes=attributes, filters=filters)
                    .to_csv(index=False)
                    .replace("\r", "")
                )
            except Exception as e:
                print(
                    f"Error getting data (attempt {attempt+1}/{MAX_RETRIES}): {str(e)}",
                    file=sys.stderr,
                )
                if attempt < MAX_RETRIES - 1:
                    print(f"Retrying in {RETRY_DELAY} seconds...", file=sys.stderr)
                    time.sleep(RETRY_DELAY)
                else:
                    return f"Error: {str(e)}"
  • Cached helper function to create and retrieve the pybiomart Server connection, used by get_data.
    @lru_cache()
    def get_server():
        """Create and cache a server connection with error handling"""
        try:
            return pybiomart.Server(host=DEFAULT_HOST)
        except Exception as e:
            print(f"Error connecting to Biomart server: {str(e)}", file=sys.stderr)
            raise
  • Type hints and docstring defining input schema (mart:str, dataset:str, attributes:list[str], filters:dict[str,str]) and output (str: CSV). Note: schema embedded in handler.
    def get_data(mart: str, dataset: str, attributes: list[str], filters: dict[str, str]):
        """
        Queries Biomart for data using specified attributes and filters.
    
        This function performs the main data retrieval from Biomart, allowing you to
        query biological data by specifying which attributes to return and which filters
        to apply. Includes automatic retry logic for resilience.
    
        Args:
            mart (str): The mart identifier (e.g., "ENSEMBL_MART_ENSEMBL")
            dataset (str): The dataset identifier (e.g., "hsapiens_gene_ensembl")
            attributes (list[str]): List of attributes to retrieve (e.g., ["ensembl_gene_id", "external_gene_name"])
            filters (dict[str, str]): Dictionary of filters to apply (e.g., {"chromosome_name": "1"})
    
        Returns:
            str: CSV-formatted results of the query.
    
        Example:
            get_data(
                "ENSEMBL_MART_ENSEMBL",
                "hsapiens_gene_ensembl",
                ["ensembl_gene_id", "external_gene_name", "chromosome_name"],
                {"chromosome_name": "X", "biotype": "protein_coding"}
            )
            >>> "ensembl_gene_id,external_gene_name,chromosome_name
                 ENSG00000000003,TSPAN6,X
                 ENSG00000000005,TNMD,X
                 ..."
        """
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and adds valuable behavioral context beyond basic functionality. It discloses 'automatic retry logic for resilience' which is important operational behavior, and specifies the return format ('CSV-formatted results'). However, it doesn't mention potential rate limits, error conditions, or performance characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with purpose statement, functional details, parameter documentation, return specification, and example. While slightly longer than minimal, every section adds value. The front-loaded purpose statement is clear, and the example is particularly helpful. Minor deduction for some redundancy between the opening statement and the detailed explanation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters with 0% schema coverage and no output schema, the description provides substantial context: clear purpose, parameter semantics with examples, return format specification, and behavioral details. The main gap is lack of explicit error handling guidance or performance expectations, but for a query tool with good parameter documentation, this is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing detailed parameter semantics. Each of the 4 parameters is clearly explained with examples: mart ('mart identifier'), dataset ('dataset identifier'), attributes ('list of attributes to retrieve'), filters ('dictionary of filters to apply'). The examples make the parameter usage concrete and understandable.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('queries', 'performs data retrieval') and resources ('Biomart', 'biological data'). It distinguishes this tool from siblings by specifying it's for main data retrieval with attributes and filters, unlike list_* tools that only list metadata or translation tools that convert data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('main data retrieval from Biomart'), but doesn't explicitly state when NOT to use it or name specific alternatives. It implies usage for querying actual data rather than listing metadata (like list_datasets), but doesn't provide explicit exclusions or comparison to siblings like get_translation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jzinno/biomart-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server