Skip to main content
Glama

ingest

Map labels and embeddings from single-cell RNA sequencing reference data to new datasets using k-nearest neighbors and embedding methods for biological analysis.

Instructions

Map labels and embeddings from reference data to new data

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
obsNoLabels' keys in adata_ref.obs which need to be mapped to adata.obs (inferred for observation of adata).
embedding_methodNoEmbeddings in adata_ref which need to be mapped to adata. The only supported values are 'umap' and 'pca'.
labeling_methodNoThe method to map labels in adata_ref.obs to adata.obs. The only supported value is 'knn'.knn
neighbors_keyNoIf specified, ingest looks adata_ref.uns[neighbors_key] for neighbors settings and uses the corresponding distances.

Implementation Reference

  • Handler function that executes the 'ingest' tool by calling sc.tl.ingest with validated arguments on the active AnnData object.
    def run_tl_func(ads, func, arguments):
        adata = ads.adata_dic[ads.active]
        if func not in tl_func:
            raise ValueError(f"Unsupported function: {func}")
        run_func = tl_func[func]
        parameters = inspect.signature(run_func).parameters
        kwargs = {k: arguments.get(k) for k in parameters if k in arguments}    
        try:
            res = run_func(adata, **kwargs)
            add_op_log(adata, run_func, kwargs)
        except Exception as e:
            logger.error(f"Error running function {func}: {e}")
            raise
        return 
  • Pydantic model defining the input schema and validation for the 'ingest' tool parameters.
    class IngestModel(JSONParsingModel):
        """Input schema for the ingest tool that maps labels and embeddings from reference data to new data."""
        
        obs: Optional[Union[str, List[str]]] = Field(
            default=None,
            description="Labels' keys in adata_ref.obs which need to be mapped to adata.obs (inferred for observation of adata)."
        )
        
        embedding_method: Union[str, List[str]] = Field(
            default=['umap', 'pca'],
            description="Embeddings in adata_ref which need to be mapped to adata. The only supported values are 'umap' and 'pca'."
        )
        
        labeling_method: str = Field(
            default='knn',
            description="The method to map labels in adata_ref.obs to adata.obs. The only supported value is 'knn'."
        )
        
        neighbors_key: Optional[str] = Field(
            default=None,
            description="If specified, ingest looks adata_ref.uns[neighbors_key] for neighbors settings and uses the corresponding distances."
        )
        
        @field_validator('embedding_method')
        def validate_embedding_method(cls, v: Union[str, List[str]]) -> Union[str, List[str]]:
            """Validate embedding method is supported"""
            valid_methods = ['umap', 'pca']
            
            if isinstance(v, str):
                if v.lower() not in valid_methods:
                    raise ValueError(f"embedding_method must be one of {valid_methods}")
                return v.lower()
            
            elif isinstance(v, list):
                for method in v:
                    if method.lower() not in valid_methods:
                        raise ValueError(f"embedding_method must contain only values from {valid_methods}")
                return [method.lower() for method in v]
            
            return v
        
        @field_validator('labeling_method')
        def validate_labeling_method(cls, v: str) -> str:
            """Validate labeling method is supported"""
            if v.lower() != 'knn':
                raise ValueError("labeling_method must be 'knn'")
            return v.lower()
  • Registers the 'ingest' tool object with MCP types.Tool, including schema reference.
    # Add ingest tool
    ingest_tool = types.Tool(
        name="ingest",
        description="Map labels and embeddings from reference data to new data",
        inputSchema=IngestModel.model_json_schema(),
    )
  • Maps 'ingest' tool name to the underlying scanpy function sc.tl.ingest for execution.
    tl_func = {
        "tsne": sc.tl.tsne,
        "umap": sc.tl.umap,
        "draw_graph": sc.tl.draw_graph,
        "diffmap": sc.tl.diffmap,
        "embedding_density": sc.tl.embedding_density,
        "leiden": sc.tl.leiden,
        "louvain": sc.tl.louvain,
        "dendrogram": sc.tl.dendrogram,
        "dpt": sc.tl.dpt,
        "paga": sc.tl.paga,
        "ingest": sc.tl.ingest,
        "rank_genes_groups": sc.tl.rank_genes_groups,
        "filter_rank_genes_groups": sc.tl.filter_rank_genes_groups,
        "marker_gene_overlap": sc.tl.marker_gene_overlap,
        "score_genes": sc.tl.score_genes,
        "score_genes_cell_cycle": sc.tl.score_genes_cell_cycle,
    }
  • In the MCP server call_tool handler, dispatches 'ingest' (as part of tl_tools) to run_tl_func.
    elif name in tl_tools.keys():
        res = run_tl_func(ads, name, arguments) 
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers minimal behavioral insight. It mentions mapping operations but doesn't disclose whether this is read-only or mutating, what permissions are needed, how it handles errors, or what the output format looks like. For a data transformation tool with zero annotation coverage, this is inadequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's appropriately sized and front-loaded with the core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a data mapping tool with 4 parameters, no annotations, and no output schema, the description is insufficient. It doesn't explain what 'reference data' and 'new data' refer to, what format the mapping produces, or how this tool fits within the broader data analysis workflow alongside siblings.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all four parameters thoroughly. The description adds no parameter-specific information beyond what's in the schema, maintaining the baseline score of 3 for adequate but not enhanced parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Map labels and embeddings from reference data to new data.' It specifies the action ('map') and resources ('labels and embeddings'), but doesn't differentiate from siblings like 'merge_adata' or 'neighbors' which might have overlapping data integration functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided about when to use this tool versus alternatives. The description doesn't mention prerequisites, appropriate contexts, or comparison to sibling tools like 'merge_adata' for data integration or 'neighbors' for similarity calculations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/huang-sh/scmcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server