Skip to main content
Glama

read_tool

Extract and process single-cell RNA sequencing data from multiple file formats (h5ad, 10x, text files) or directories. Supports memory-efficient backed modes, URL retrieval, and customizable parsing options for analysis.

Instructions

Read data from various file formats (h5ad, 10x, text files, etc.) or directory path.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
backedNoIf 'r', load AnnData in 'backed' mode instead of fully loading it into memory ('memory' mode). If you want to modify backed attributes of the AnnData object, you need to choose 'r+'.
backup_urlNoRetrieve the file from an URL if not present on disk.
cacheNoIf False, read from source, if True, read from fast 'h5ad' cache.
cache_compressionNoSee the h5py dataset_compression. (Default: settings.cache_compression)
delimiterNoDelimiter that separates data within text file. If None, will split at arbitrary number of white spaces, which is different from enforcing splitting at any single white space.
extNoExtension that indicates the file type. If None, uses extension of filename.
filenameYesPath to the file to read.
first_column_namesNoAssume the first column stores row names. This is only necessary if these are not strings: strings in the first column are automatically assumed to be row names.
first_column_obsNoIf True, assume the first column stores observations (cell or barcode) names when provide text file. If False, the data will be transposed.
gex_onlyNoOnly keep 'Gene Expression' data and ignore other feature types, e.g. 'Antibody Capture', 'CRISPR Guide Capture', or 'Custom'. Used for 10x formats.
make_uniqueNoWhether to make the variables index unique by appending '-1', '-2' etc. or not. Used for 10x mtx format.
prefixNoAny prefix before matrix.mtx, genes.tsv and barcodes.tsv. For instance, if the files are named patientA_matrix.mtx, patientA_genes.tsv and patientA_barcodes.tsv the prefix is patientA_. Used for 10x mtx format.
sampleidNoSample identifier to mark and distinguish different samples.
sheetNoName of sheet/table in hdf5 or Excel file.
var_namesNoThe variables index for 10x mtx format. Either 'gene_symbols' or 'gene_ids'.gene_symbols

Implementation Reference

  • Core handler function that implements the logic to read AnnData objects from files or directories using scanpy.read or sc.read_10x_mtx based on input parameters.
    def read_func(**kwargs): file = Path(kwargs["filename"]) if file.is_dir(): kwargs["path"] = kwargs["filename"] parameters = inspect.signature(sc.read_10x_mtx).parameters func_kwargs = {k: kwargs.get(k) for k in parameters if k in kwargs} adata = sc.read_10x_mtx(**func_kwargs) elif file.is_file(): parameters = inspect.signature(sc.read).parameters func_kwargs = {k: kwargs.get(k) for k in parameters if k in kwargs} logger.info(func_kwargs) adata = sc.read(**func_kwargs) if not kwargs.get("first_column_obs", True): adata = adata.T else: adata = "there are no file" return adata
  • Pydantic model that defines the input schema (parameters and validators) for the read_tool MCP tool.
    class ReadModel(JSONParsingModel): """Input schema for the read tool.""" filename: str = Field( description="Path to the file to read." ) sampleid: Optional[str] = Field( default=None, description="Sample identifier to mark and distinguish different samples." ) backed: Optional[Literal['r', 'r+']] = Field( default=None, description="If 'r', load AnnData in 'backed' mode instead of fully loading it into memory ('memory' mode). If you want to modify backed attributes of the AnnData object, you need to choose 'r+'." ) sheet: Optional[str] = Field( default=None, description="Name of sheet/table in hdf5 or Excel file." ) ext: Optional[str] = Field( default=None, description="Extension that indicates the file type. If None, uses extension of filename." ) delimiter: Optional[str] = Field( default=None, description="Delimiter that separates data within text file. If None, will split at arbitrary number of white spaces, which is different from enforcing splitting at any single white space." ) first_column_names: bool = Field( default=False, description="Assume the first column stores row names. This is only necessary if these are not strings: strings in the first column are automatically assumed to be row names." ) first_column_obs: bool = Field( default=True, description="If True, assume the first column stores observations (cell or barcode) names when provide text file. If False, the data will be transposed." ) backup_url: Optional[str] = Field( default=None, description="Retrieve the file from an URL if not present on disk." ) cache: bool = Field( default=False, description="If False, read from source, if True, read from fast 'h5ad' cache." ) cache_compression: Optional[Literal['gzip', 'lzf']] = Field( default=None, description="See the h5py dataset_compression. (Default: settings.cache_compression)" ) var_names: Optional[str] = Field( default="gene_symbols", description="The variables index for 10x mtx format. Either 'gene_symbols' or 'gene_ids'." ) make_unique: bool = Field( default=True, description="Whether to make the variables index unique by appending '-1', '-2' etc. or not. Used for 10x mtx format." ) gex_only: bool = Field( default=True, description="Only keep 'Gene Expression' data and ignore other feature types, e.g. 'Antibody Capture', 'CRISPR Guide Capture', or 'Custom'. Used for 10x formats." ) prefix: Optional[str] = Field( default=None, description="Any prefix before matrix.mtx, genes.tsv and barcodes.tsv. For instance, if the files are named patientA_matrix.mtx, patientA_genes.tsv and patientA_barcodes.tsv the prefix is patientA_. Used for 10x mtx format." ) @field_validator('backed') def validate_backed(cls, v: Optional[str]) -> Optional[str]: if v is not None and v not in ['r', 'r+']: raise ValueError("If backed is provided, it must be either 'r' or 'r+'") return v @field_validator('cache_compression') def validate_cache_compression(cls, v: Optional[str]) -> Optional[str]: if v is not None and v not in ['gzip', 'lzf']: raise ValueError("cache_compression must be either 'gzip', 'lzf', or None") return v @field_validator('var_names') def validate_var_names(cls, v: Optional[str]) -> Optional[str]: if v is not None and v not in ['gene_symbols', 'gene_ids']: raise ValueError("var_names must be either 'gene_symbols' or 'gene_ids'") return v
  • Registers the read_tool as an MCP Tool with name, description, and input schema from ReadModel.
    read_tool = types.Tool( name="read_tool", description="Read data from various file formats (h5ad, 10x, text files, etc.) or directory path.", inputSchema=ReadModel.model_json_schema(), )
  • Adds the read_tool to the io_tools dictionary, likely used for server registration.
    io_tools = { "read_tool": read_tool, "write_tool": write_tool, }
  • Maps tool names to their handler functions, linking 'read_tool' to read_func.
    io_func = { "read_tool": read_func, "write_tool": sc.write, }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/huang-sh/scmcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server