mark_var
Identify genes meeting specific conditions (e.g., mitochondrion, ribosomal, hemoglobin) and store results as boolean values in adata.var for quality control metrics calculation.
Instructions
Determine if each gene meets specific conditions and store results in adata.var as boolean values.for example: mitochondrion genes startswith MT-.the tool should be call first when calculate quality control metrics for mitochondrion, ribosomal, harhemoglobin genes. or other qc_vars
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| gene_class | No | Gene class type (Mitochondrion/Ribosomal/Hemoglobin) | |
| pattern_type | No | Pattern matching type (startswith/endswith/contains), it should be None when gene_class is not None | |
| patterns | No | gene pattern to match, must be a string, it should be None when gene_class is not None | |
| var_name | No | Column name that will be added to adata.var, do not set if user does not ask |
Implementation Reference
- src/scmcp/tool/util.py:47-69 (handler)The core handler function that implements the mark_var tool. It adds boolean columns to adata.var based on gene_class (mt, ribo, hb) or custom pattern matching (startswith, endswith, contains). Returns value counts and a success message.def mark_var(adata, var_name: str = None, gene_class: str = None, pattern_type: str = None, patterns: str = None): if gene_class is not None: if gene_class == "mitochondrion": adata.var["mt"] = adata.var_names.str.startswith(('MT-', 'Mt','mt-')) var_name = "mt" elif gene_class == "ribosomal": adata.var["ribo"] = adata.var_names.str.startswith(("RPS", "RPL")) var_name = "ribo" elif gene_class == "hemoglobin": adata.var["hb"] = adata.var_names.str.contains("^HB[^(P)]", case=False) var_name = "hb" if pattern_type is not None and patterns is not None: if pattern_type == "startswith": adata.var[var_name] = adata.var_names.str.startswith(patterns) elif pattern_type == "endswith": adata.var[var_name] = adata.var_names.str.endswith(patterns) elif pattern_type == "contains": adata.var[var_name] = adata.var_names.str.contains(patterns) else: raise ValueError(f"Did not support pattern_type: {pattern_type}") return {var_name: adata.var[var_name].value_counts().to_dict(), "msg": f"add '{var_name}' column in adata.var"}
- src/scmcp/schema/util.py:15-35 (schema)Pydantic model (MarkVarModel) that defines the input schema for the mark_var tool, validating parameters like var_name, pattern_type, patterns, and gene_class.class MarkVarModel(JSONParsingModel): """Determine or mark if each gene meets specific conditions and store results in adata.var as boolean values""" var_name: str = Field( default=None, description="Column name that will be added to adata.var, do not set if user does not ask" ) pattern_type: Optional[Literal["startswith", "endswith", "contains"]] = Field( default=None, description="Pattern matching type (startswith/endswith/contains), it should be None when gene_class is not None" ) patterns: str = Field( default=None, description="gene pattern to match, must be a string, it should be None when gene_class is not None" ) gene_class: Optional[Literal["mitochondrion", "ribosomal", "hemoglobin"]] = Field( default=None, description="Gene class type (Mitochondrion/Ribosomal/Hemoglobin)" )
- src/scmcp/tool/util.py:11-19 (registration)Registers the mark_var tool using mcp.types.Tool, providing the name, description, and input schema from MarkVarModel. This Tool object is later included in util_tools dictionary.mark_var_tool = types.Tool( name="mark_var", description=( "Determine if each gene meets specific conditions and store results in adata.var as boolean values." "for example: mitochondrion genes startswith MT-." "the tool should be call first when calculate quality control metrics for mitochondrion, ribosomal, harhemoglobin genes. or other qc_vars" ), inputSchema=MarkVarModel.model_json_schema(), )