highly_variable_genes

Identify and annotate highly variable genes in single-cell RNA-seq data to focus analysis on biologically relevant features using dispersion and expression cutoffs.

Instructions

Annotate highly variable genes

Input Schema

TableJSON Schema

Name	Required	Description	Default
`layer`	No	If provided, use adata.layers[layer] for expression values.
`n_top_genes`	No	Number of highly-variable genes to keep. Mandatory if `flavor='seurat_v3'
`min_disp`	No	Minimum dispersion cutoff for gene selection.
`max_disp`	No	Maximum dispersion cutoff for gene selection.
`min_mean`	No	Minimum mean expression cutoff for gene selection.
`max_mean`	No	Maximum mean expression cutoff for gene selection.
`span`	No	Fraction of data used for loess model fit in seurat_v3.
`n_bins`	No	Number of bins for mean expression binning.
`flavor`	No	Method for identifying highly variable genes.	seurat
`subset`	No	Inplace subset to highly-variable genes if True.
`batch_key`	No	Key in adata.obs for batch information.
`check_values`	No	Check if counts are integers for seurat_v3 flavor.

Implementation Reference

src/scmcp/tool/pp.py:120-137 (handler)
Handler function that executes the preprocessing tools, including highly_variable_genes, by dispatching to the corresponding scanpy.pp function (sc.pp.highly_variable_genes).
def run_pp_func(ads, func, arguments): adata = ads.adata_dic[ads.active] if func not in pp_func: raise ValueError(f"不支持的函数: {func}") run_func = pp_func[func] parameters = inspect.signature(run_func).parameters arguments["inplace"] = True kwargs = {k: arguments.get(k) for k in parameters if k in arguments} try: res = run_func(adata, **kwargs) add_op_log(adata, run_func, kwargs) except KeyError as e: raise KeyError(f"Can not foud {e} column in adata.obs or adata.var") except Exception as e: raise e return res
src/scmcp/tool/pp.py:51-55 (registration)
Registration of the 'highly_variable_genes' tool using MCP types.Tool, linking to its schema.
highly_variable_genes = types.Tool( name="highly_variable_genes", description="Annotate highly variable genes", inputSchema=HighlyVariableGenesModel.model_json_schema(), )
src/scmcp/schema/pp.py:220-292 (schema)
Pydantic model defining the input schema and validation for the highly_variable_genes tool.
class HighlyVariableGenesModel(JSONParsingModel): """Input schema for the highly_variable_genes preprocessing tool.""" layer: Optional[str] = Field( default=None, description="If provided, use adata.layers[layer] for expression values." ) n_top_genes: Optional[int] = Field( default=None, description="Number of highly-variable genes to keep. Mandatory if `flavor='seurat_v3'", ) min_disp: float = Field( default=0.5, description="Minimum dispersion cutoff for gene selection." ) max_disp: float = Field( default=float('inf'), description="Maximum dispersion cutoff for gene selection." ) min_mean: float = Field( default=0.0125, description="Minimum mean expression cutoff for gene selection." ) max_mean: float = Field( default=3, description="Maximum mean expression cutoff for gene selection." ) span: float = Field( default=0.3, description="Fraction of data used for loess model fit in seurat_v3.", gt=0, lt=1 ) n_bins: int = Field( default=20, description="Number of bins for mean expression binning.", gt=0 ) flavor: Literal['seurat', 'cell_ranger', 'seurat_v3', 'seurat_v3_paper'] = Field( default='seurat', description="Method for identifying highly variable genes." ) subset: bool = Field( default=False, description="Inplace subset to highly-variable genes if True." ) batch_key: Optional[str] = Field( default=None, description="Key in adata.obs for batch information." ) check_values: bool = Field( default=True, description="Check if counts are integers for seurat_v3 flavor." ) @field_validator('n_top_genes', 'n_bins') def validate_positive_integers(cls, v: Optional[int]) -> Optional[int]: """Validate positive integers""" if v is not None and v <= 0: raise ValueError("must be a positive integer") return v @field_validator('span') def validate_span(cls, v: float) -> float: """Validate span is between 0 and 1""" if v <= 0 or v >= 1: raise ValueError("span must be between 0 and 1") return v
src/scmcp/tool/pp.py:88-101 (helper)
Mapping dictionary that associates the 'highly_variable_genes' tool name to the scanpy implementation sc.pp.highly_variable_genes, used by the handler.
pp_func = { "filter_genes": sc.pp.filter_genes, "filter_cells": sc.pp.filter_cells, "calculate_qc_metrics": partial(sc.pp.calculate_qc_metrics, inplace=True), "log1p": sc.pp.log1p, "normalize_total": sc.pp.normalize_total, "pca": sc.pp.pca, "highly_variable_genes": sc.pp.highly_variable_genes, "regress_out": sc.pp.regress_out, "scale": sc.pp.scale, "combat": sc.pp.combat, "scrublet": sc.pp.scrublet, "neighbors": sc.pp.neighbors, }
src/scmcp/tool/pp.py:104-117 (registration)
Dictionary registering the tool objects by name, including highly_variable_genes.
pp_tools = { "filter_genes": filter_genes, "filter_cells": filter_cells, "calculate_qc_metrics": calculate_qc_metrics, "log1p": log1p, "normalize_total": normalize_total, "pca": pca, "highly_variable_genes": highly_variable_genes, "regress_out": regress_out, "scale": scale, "combat": combat, "scrublet": scrublet, "neighbors": neighbors, }

SCMCP

highly_variable_genes

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API