Skip to main content
Glama

highly_variable_genes

Identify and annotate highly variable genes in single-cell RNA-seq data to focus analysis on biologically relevant features using dispersion and expression cutoffs.

Instructions

Annotate highly variable genes

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
layerNoIf provided, use adata.layers[layer] for expression values.
n_top_genesNoNumber of highly-variable genes to keep. Mandatory if `flavor='seurat_v3'
min_dispNoMinimum dispersion cutoff for gene selection.
max_dispNoMaximum dispersion cutoff for gene selection.
min_meanNoMinimum mean expression cutoff for gene selection.
max_meanNoMaximum mean expression cutoff for gene selection.
spanNoFraction of data used for loess model fit in seurat_v3.
n_binsNoNumber of bins for mean expression binning.
flavorNoMethod for identifying highly variable genes.seurat
subsetNoInplace subset to highly-variable genes if True.
batch_keyNoKey in adata.obs for batch information.
check_valuesNoCheck if counts are integers for seurat_v3 flavor.

Implementation Reference

  • Handler function that executes the preprocessing tools, including highly_variable_genes, by dispatching to the corresponding scanpy.pp function (sc.pp.highly_variable_genes).
    def run_pp_func(ads, func, arguments):
        adata = ads.adata_dic[ads.active]
        if func not in pp_func:
            raise ValueError(f"不支持的函数: {func}")
        
        run_func = pp_func[func]
        parameters = inspect.signature(run_func).parameters
        arguments["inplace"] = True
        kwargs = {k: arguments.get(k) for k in parameters if k in arguments}
        try:
            res = run_func(adata, **kwargs)
            add_op_log(adata, run_func, kwargs)
        except KeyError as e:
            raise KeyError(f"Can not foud {e} column in adata.obs or adata.var")
        except Exception as e:
           raise e
        return res
  • Registration of the 'highly_variable_genes' tool using MCP types.Tool, linking to its schema.
    highly_variable_genes = types.Tool(
        name="highly_variable_genes",
        description="Annotate highly variable genes",
        inputSchema=HighlyVariableGenesModel.model_json_schema(),
    )
  • Pydantic model defining the input schema and validation for the highly_variable_genes tool.
    class HighlyVariableGenesModel(JSONParsingModel):
        """Input schema for the highly_variable_genes preprocessing tool."""
        
        layer: Optional[str] = Field(
            default=None,
            description="If provided, use adata.layers[layer] for expression values."
        )
        
        n_top_genes: Optional[int] = Field(
            default=None,
            description="Number of highly-variable genes to keep. Mandatory if `flavor='seurat_v3'",
        )
        
        min_disp: float = Field(
            default=0.5,
            description="Minimum dispersion cutoff for gene selection."
        )
        
        max_disp: float = Field(
            default=float('inf'),
            description="Maximum dispersion cutoff for gene selection."
        )
        min_mean: float = Field(
            default=0.0125,
            description="Minimum mean expression cutoff for gene selection."
        )
        max_mean: float = Field(
            default=3,
            description="Maximum mean expression cutoff for gene selection."
        )
        span: float = Field(
            default=0.3,
            description="Fraction of data used for loess model fit in seurat_v3.",
            gt=0,
            lt=1
        )
        n_bins: int = Field(
            default=20,
            description="Number of bins for mean expression binning.",
            gt=0
        )
        flavor: Literal['seurat', 'cell_ranger', 'seurat_v3', 'seurat_v3_paper'] = Field(
            default='seurat',
            description="Method for identifying highly variable genes."
        )
        subset: bool = Field(
            default=False,
            description="Inplace subset to highly-variable genes if True."
        )
        batch_key: Optional[str] = Field(
            default=None,
            description="Key in adata.obs for batch information."
        )
        
        check_values: bool = Field(
            default=True,
            description="Check if counts are integers for seurat_v3 flavor."
        )
        
        @field_validator('n_top_genes', 'n_bins')
        def validate_positive_integers(cls, v: Optional[int]) -> Optional[int]:
            """Validate positive integers"""
            if v is not None and v <= 0:
                raise ValueError("must be a positive integer")
            return v
        
        @field_validator('span')
        def validate_span(cls, v: float) -> float:
            """Validate span is between 0 and 1"""
            if v <= 0 or v >= 1:
                raise ValueError("span must be between 0 and 1")
            return v
  • Mapping dictionary that associates the 'highly_variable_genes' tool name to the scanpy implementation sc.pp.highly_variable_genes, used by the handler.
    pp_func = {
        "filter_genes": sc.pp.filter_genes,
        "filter_cells": sc.pp.filter_cells,
        "calculate_qc_metrics": partial(sc.pp.calculate_qc_metrics, inplace=True),
        "log1p": sc.pp.log1p,
        "normalize_total": sc.pp.normalize_total,
        "pca": sc.pp.pca,
        "highly_variable_genes": sc.pp.highly_variable_genes,
        "regress_out": sc.pp.regress_out,
        "scale": sc.pp.scale,
        "combat": sc.pp.combat,
        "scrublet": sc.pp.scrublet,
        "neighbors": sc.pp.neighbors,
    }
  • Dictionary registering the tool objects by name, including highly_variable_genes.
    pp_tools = {
        "filter_genes": filter_genes,
        "filter_cells": filter_cells,
        "calculate_qc_metrics": calculate_qc_metrics,
        "log1p": log1p,
        "normalize_total": normalize_total,
        "pca": pca,
        "highly_variable_genes": highly_variable_genes,
        "regress_out": regress_out,
        "scale": scale,
        "combat": combat,
        "scrublet": scrublet,
        "neighbors": neighbors,
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/huang-sh/scmcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server