ingest
Map labels and embeddings from reference data to new data using specified labeling and embedding methods, supported by the SCMCP server for single-cell RNA sequencing analysis.
Instructions
Map labels and embeddings from reference data to new data
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| embedding_method | No | Embeddings in adata_ref which need to be mapped to adata. The only supported values are 'umap' and 'pca'. | |
| labeling_method | No | The method to map labels in adata_ref.obs to adata.obs. The only supported value is 'knn'. | knn |
| neighbors_key | No | If specified, ingest looks adata_ref.uns[neighbors_key] for neighbors settings and uses the corresponding distances. | |
| obs | No | Labels' keys in adata_ref.obs which need to be mapped to adata.obs (inferred for observation of adata). |
Implementation Reference
- src/scmcp/tool/tl.py:164-177 (handler)Shared handler function for all tl tools including ingest. Retrieves the Scanpy function (sc.tl.ingest for ingest), extracts matching parameters from input arguments, executes it on the active AnnData object, and logs the operation.def run_tl_func(ads, func, arguments): adata = ads.adata_dic[ads.active] if func not in tl_func: raise ValueError(f"Unsupported function: {func}") run_func = tl_func[func] parameters = inspect.signature(run_func).parameters kwargs = {k: arguments.get(k) for k in parameters if k in arguments} try: res = run_func(adata, **kwargs) add_op_log(adata, run_func, kwargs) except Exception as e: logger.error(f"Error running function {func}: {e}") raise return
- src/scmcp/schema/tl.py:558-605 (schema)Pydantic model IngestModel defining the input parameters and validation for the ingest tool, including obs labels, embedding_method (umap/pca), labeling_method (knn), and neighbors_key.class IngestModel(JSONParsingModel): """Input schema for the ingest tool that maps labels and embeddings from reference data to new data.""" obs: Optional[Union[str, List[str]]] = Field( default=None, description="Labels' keys in adata_ref.obs which need to be mapped to adata.obs (inferred for observation of adata)." ) embedding_method: Union[str, List[str]] = Field( default=['umap', 'pca'], description="Embeddings in adata_ref which need to be mapped to adata. The only supported values are 'umap' and 'pca'." ) labeling_method: str = Field( default='knn', description="The method to map labels in adata_ref.obs to adata.obs. The only supported value is 'knn'." ) neighbors_key: Optional[str] = Field( default=None, description="If specified, ingest looks adata_ref.uns[neighbors_key] for neighbors settings and uses the corresponding distances." ) @field_validator('embedding_method') def validate_embedding_method(cls, v: Union[str, List[str]]) -> Union[str, List[str]]: """Validate embedding method is supported""" valid_methods = ['umap', 'pca'] if isinstance(v, str): if v.lower() not in valid_methods: raise ValueError(f"embedding_method must be one of {valid_methods}") return v.lower() elif isinstance(v, list): for method in v: if method.lower() not in valid_methods: raise ValueError(f"embedding_method must contain only values from {valid_methods}") return [method.lower() for method in v] return v @field_validator('labeling_method') def validate_labeling_method(cls, v: str) -> str: """Validate labeling method is supported""" if v.lower() != 'knn': raise ValueError("labeling_method must be 'knn'") return v.lower()
- src/scmcp/tool/tl.py:82-87 (registration)Registration/definition of the 'ingest' MCP Tool object, specifying name, description, and input schema from IngestModel.# Add ingest tool ingest_tool = types.Tool( name="ingest", description="Map labels and embeddings from reference data to new data", inputSchema=IngestModel.model_json_schema(), )
- src/scmcp/tool/tl.py:125-142 (registration)Mapping of tl tool names to their corresponding Scanpy functions (sc.tl.*), including 'ingest': sc.tl.ingest. Used by the handler to dispatch calls.tl_func = { "tsne": sc.tl.tsne, "umap": sc.tl.umap, "draw_graph": sc.tl.draw_graph, "diffmap": sc.tl.diffmap, "embedding_density": sc.tl.embedding_density, "leiden": sc.tl.leiden, "louvain": sc.tl.louvain, "dendrogram": sc.tl.dendrogram, "dpt": sc.tl.dpt, "paga": sc.tl.paga, "ingest": sc.tl.ingest, "rank_genes_groups": sc.tl.rank_genes_groups, "filter_rank_genes_groups": sc.tl.filter_rank_genes_groups, "marker_gene_overlap": sc.tl.marker_gene_overlap, "score_genes": sc.tl.score_genes, "score_genes_cell_cycle": sc.tl.score_genes_cell_cycle, }
- src/scmcp/tool/tl.py:145-162 (registration)tl_tools dictionary registering all tl Tool objects, including 'ingest': ingest_tool, which is aggregated and used by the MCP server for tool listing.tl_tools = { "tsne": tsne_tool, "umap": umap_tool, "draw_graph": draw_graph_tool, "diffmap": diffmap_tool, "embedding_density": embedding_density_tool, "leiden": leiden_tool, "louvain": louvain_tool, "dendrogram": dendrogram_tool, "dpt": dpt_tool, "paga": paga_tool, "ingest": ingest_tool, "rank_genes_groups": rank_genes_groups_tool, "filter_rank_genes_groups": filter_rank_genes_groups_tool, "marker_gene_overlap": marker_gene_overlap_tool, "score_genes": score_genes_tool, "score_genes_cell_cycle": score_genes_cell_cycle_tool, }