Skip to main content
Glama

marker_gene_overlap

Calculate overlap between data-derived marker genes and reference markers to identify shared biological signatures in single-cell RNA sequencing analysis.

Instructions

Calculate overlap between data-derived marker genes and reference markers

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
keyNoThe key in adata.uns where the rank_genes_groups output is stored.rank_genes_groups
methodNoMethod to calculate marker gene overlap: 'overlap_count', 'overlap_coef', or 'jaccard'.overlap_count
normalizeNoNormalization option for the marker gene overlap output. Only applicable when method is 'overlap_count'.
top_n_markersNoThe number of top data-derived marker genes to use. By default the top 100 marker genes are used.
adj_pval_thresholdNoA significance threshold on the adjusted p-values to select marker genes.
key_addedNoName of the .uns field that will contain the marker overlap scores.marker_gene_overlap

Implementation Reference

  • Handler function that executes the marker_gene_overlap tool (and other tl tools) by retrieving the corresponding scanpy function from tl_func mapping and calling it on the active AnnData object with validated arguments.
    def run_tl_func(ads, func, arguments):
        adata = ads.adata_dic[ads.active]
        if func not in tl_func:
            raise ValueError(f"Unsupported function: {func}")
        run_func = tl_func[func]
        parameters = inspect.signature(run_func).parameters
        kwargs = {k: arguments.get(k) for k in parameters if k in arguments}    
        try:
            res = run_func(adata, **kwargs)
            add_op_log(adata, run_func, kwargs)
        except Exception as e:
            logger.error(f"Error running function {func}: {e}")
            raise
        return 
  • Pydantic model defining the input schema and validators for the marker_gene_overlap tool parameters.
    class MarkerGeneOverlapModel(JSONParsingModel):
        """Input schema for the marker gene overlap tool."""
        
        key: str = Field(
            default='rank_genes_groups',
            description="The key in adata.uns where the rank_genes_groups output is stored."
        )
        
        method: str = Field(
            default='overlap_count',
            description="Method to calculate marker gene overlap: 'overlap_count', 'overlap_coef', or 'jaccard'."
        )
        
        normalize: Optional[Literal['reference', 'data']] = Field(
            default=None,
            description="Normalization option for the marker gene overlap output. Only applicable when method is 'overlap_count'."
        )
        
        top_n_markers: Optional[int] = Field(
            default=None,
            description="The number of top data-derived marker genes to use. By default the top 100 marker genes are used.",
            gt=0
        )
        
        adj_pval_threshold: Optional[float] = Field(
            default=None,
            description="A significance threshold on the adjusted p-values to select marker genes.",
            gt=0,
            le=1.0
        )
        
        key_added: str = Field(
            default='marker_gene_overlap',
            description="Name of the .uns field that will contain the marker overlap scores."
        )
        
        @field_validator('method')
        def validate_method(cls, v: str) -> str:
            """Validate method is supported"""
            valid_methods = ['overlap_count', 'overlap_coef', 'jaccard']
            if v not in valid_methods:
                raise ValueError(f"method must be one of {valid_methods}")
            return v
        
        @field_validator('normalize')
        def validate_normalize(cls, v: Optional[str], info: ValidationInfo) -> Optional[str]:
            """Validate normalize is only used with overlap_count method"""
            if v is not None:
                if v not in ['reference', 'data']:
                    raise ValueError("normalize must be either 'reference' or 'data'")
                
                values = info.data
                if 'method' in values and values['method'] != 'overlap_count':
                    raise ValueError("normalize can only be used when method is 'overlap_count'")
            return v
        
        @field_validator('top_n_markers')
        def validate_top_n_markers(cls, v: Optional[int]) -> Optional[int]:
            """Validate top_n_markers is positive"""
            if v is not None and v <= 0:
                raise ValueError("top_n_markers must be a positive integer")
            return v
        
        @field_validator('adj_pval_threshold')
        def validate_adj_pval_threshold(cls, v: Optional[float]) -> Optional[float]:
            """Validate adj_pval_threshold is between 0 and 1"""
            if v is not None and (v <= 0 or v > 1):
                raise ValueError("adj_pval_threshold must be between 0 and 1")
            return v
  • Registers the marker_gene_overlap tool as an MCP Tool object with name, description, and input schema from MarkerGeneOverlapModel.
    # Add marker_gene_overlap tool
    marker_gene_overlap_tool = types.Tool(
        name="marker_gene_overlap",
        description="Calculate overlap between data-derived marker genes and reference markers",
        inputSchema=MarkerGeneOverlapModel.model_json_schema(),
    )
  • Maps the tool name 'marker_gene_overlap' to the underlying scanpy.tl.marker_gene_overlap function for execution.
    # Dictionary mapping tool names to scanpy functions
    tl_func = {
        "tsne": sc.tl.tsne,
        "umap": sc.tl.umap,
        "draw_graph": sc.tl.draw_graph,
        "diffmap": sc.tl.diffmap,
        "embedding_density": sc.tl.embedding_density,
        "leiden": sc.tl.leiden,
        "louvain": sc.tl.louvain,
        "dendrogram": sc.tl.dendrogram,
        "dpt": sc.tl.dpt,
        "paga": sc.tl.paga,
        "ingest": sc.tl.ingest,
        "rank_genes_groups": sc.tl.rank_genes_groups,
        "filter_rank_genes_groups": sc.tl.filter_rank_genes_groups,
        "marker_gene_overlap": sc.tl.marker_gene_overlap,
        "score_genes": sc.tl.score_genes,
        "score_genes_cell_cycle": sc.tl.score_genes_cell_cycle,
    }
  • Adds the marker_gene_overlap_tool to the dictionary of tl tools, which is used by the server for listing and dispatching.
    # Dictionary mapping tool names to tool objects
    tl_tools = {
        "tsne": tsne_tool,
        "umap": umap_tool,
        "draw_graph": draw_graph_tool,
        "diffmap": diffmap_tool,
        "embedding_density": embedding_density_tool,
        "leiden": leiden_tool,
        "louvain": louvain_tool,
        "dendrogram": dendrogram_tool,
        "dpt": dpt_tool,
        "paga": paga_tool,
        "ingest": ingest_tool,
        "rank_genes_groups": rank_genes_groups_tool,
        "filter_rank_genes_groups": filter_rank_genes_groups_tool,
        "marker_gene_overlap": marker_gene_overlap_tool,
        "score_genes": score_genes_tool,
        "score_genes_cell_cycle": score_genes_cell_cycle_tool,
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It states what the tool does but lacks critical behavioral context: it doesn't mention that this is a read-only analysis tool (implied by 'calculate'), doesn't describe output format or where results are stored (though parameters hint at '.uns'), and omits any error conditions or performance characteristics. The description is functional but insufficient for a tool with 6 parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, focused sentence that states the core purpose without unnecessary elaboration. Every word earns its place, and the structure is front-loaded with the essential action. There's zero waste or redundancy in this concise formulation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 6 parameters, no annotations, and no output schema, the description is incomplete. It doesn't explain what the tool returns (only mentioning calculation, not results), doesn't provide context about typical workflows, and offers no guidance on parameter interactions. The agent would need to rely heavily on the parameter schema to understand this tool's full behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already documents all 6 parameters thoroughly. The description adds no parameter-specific information beyond what's in the schema - it doesn't explain relationships between parameters (e.g., how 'method' interacts with 'normalize') or provide usage examples. The baseline score of 3 reflects adequate but minimal value addition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('calculate overlap') and the resources involved ('data-derived marker genes and reference markers'), providing a specific verb+resource combination. However, it doesn't distinguish this tool from sibling tools like 'rank_genes_groups' or 'score_genes', which also work with marker genes but perform different functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. There's no mention of prerequisites (e.g., needing 'rank_genes_groups' output first), nor does it differentiate from sibling tools that might handle similar data. The agent must infer usage from the tool name and parameters alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/huang-sh/scmcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server