Skip to main content
Glama

search_uniprot_entity

Search UniProt proteins via Solr queries. Filter by gene, taxonomy, and annotation status. Results include accession, protein name, and organism.

Instructions

Search for a UniProt entity ID by query.

⚠️ Only the search string and limit are accepted. Extra parameters like taxon, organism, reviewed, species, etc. are silently dropped and have no effect — express such filters inside the Solr query string instead (e.g., organism_id:9606 AND reviewed:true).

The search string can be passed as any of: query (canonical), search, term, keyword, keywords, search_term, or name.

Args: query (str): The Solr-style query string for the UniProtKB /search endpoint.

    QUERY SYNTAX:
    - Simple keyword: "rubisco"
    - Field-specific: "field:value"  (e.g., "gene:BRCA1", "protein_name:rubisco")
    - Boolean operators: AND, OR, NOT  (e.g., "gene:TP53 AND organism_id:9606")
    - Grouping with parentheses: "((gene:CTNNB1) AND (taxonomy_id:9606))"
    - Wildcards (* suffix): "gene:PRO*" matches any gene starting with PRO
    - Ranges: "length:[1000 TO 2000]" or open-ended "length:[5000 TO *]"

    KEY QUERY FIELDS:
    Identity / Name:
      accession          UniProt primary accession (e.g., "accession:P04637")
      id                 UniProt entry name / mnemonic (e.g., "id:P53_HUMAN")
      protein_name       Protein name, including synonyms (e.g., "protein_name:rubisco")
      gene               Gene name with wildcard support (e.g., "gene:BRCA*")
      gene_exact         Exact gene name match (e.g., "gene_exact:TP53")
      ec                 Enzyme Commission number (e.g., "ec:1.1.1.1")

    Taxonomy:
      organism_id        NCBI taxonomy ID (e.g., "organism_id:9606" for human,
                         "organism_id:10090" for mouse)
      organism_name      Organism scientific or common name
      taxonomy_id        Taxon ID including all descendants
      lineage            Taxonomic lineage keyword

    Annotation status:
      reviewed           true = Swiss-Prot (manually reviewed),
                         false = TrEMBL (automatically annotated)
                         ALWAYS add "reviewed:true" when seeking high-quality entries.

    Sequence properties:
      length             Sequence length as a range (e.g., "length:[100 TO 500]")
      mass               Molecular mass in Daltons (range supported)
      existence          Protein existence level: 1 (protein), 2 (transcript),
                         3 (homology), 4 (predicted), 5 (uncertain)

    Functional annotation:
      keyword            UniProt keyword name (e.g., "keyword:Kinase")
      keyword_id         UniProt keyword ID (e.g., "keyword_id:KW-0418")
      function           Function free-text annotation
      family             Protein family (e.g., "family:globin")
      organelle          Subcellular organelle (e.g., "organelle:chloroplast")
      cc_subcellular_location  Subcellular location comment

    Cross-references:
      database           Database cross-reference (e.g., "database:PDB")
      xref               Cross-reference ID (e.g., "xref:pdb-1A2B")
      chebi              ChEBI ID (e.g., "chebi:15422")
      interactor         UniProt accession of interacting protein

    Literature:
      lit_author         Author surname (e.g., "lit_author:Smith")
      lit_pubmed         PubMed ID
      lit_doi            DOI

    EXAMPLES (structured queries):
      # Reviewed human TP53 protein
      "gene_exact:TP53 AND organism_id:9606 AND reviewed:true"

      # All human kinases manually reviewed
      "keyword:Kinase AND organism_id:9606 AND reviewed:true"

      # EGFR in human or mouse
      "gene_exact:EGFR AND (organism_id:9606 OR organism_id:10090) AND reviewed:true"

      # Long chloroplast proteins (>= 5000 aa) in any organism
      "organelle:chloroplast AND length:[5000 TO *]"

      # Proteins with PDB structures involved in apoptosis
      "database:PDB AND keyword:Apoptosis AND organism_id:9606 AND reviewed:true"

      # Proteins encoded by gene names starting with "PIK3"
      "gene:PIK3* AND organism_id:9606 AND reviewed:true"

limit (int): The maximum number of results to return. Default is 20.

Returns: str: TSV-formatted results with columns: accession, protein_name, organism_name.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryNo
limitNo
searchNo
termNo
keywordNo
keywordsNo
search_termNo
nameNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that extra parameters are silently ignored, explains accepted search string aliases, and states the return format (TSV with specific columns). It does not mention rate limits or authorization, but for a search tool, this is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured with clear sections, bullet points, and examples. Every sentence contributes meaning, though some detail could be trimmed. It is front-loaded with purpose and critical warning. Appropriate for a complex tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, no annotations, and an output schema that exists but is not detailed, the description is thoroughly complete. It covers query syntax, all relevant fields, examples, parameter handling, and return format. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds immense value beyond the input schema, which only provides types and defaults. It details the Solr query syntax, lists fields, gives examples, and explains that the query can be passed via multiple parameter names. The limit parameter is also explained. This far exceeds the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches for UniProt entity IDs by query. It uses a specific verb ('Search') and resource ('UniProt entity ID'). However, it does not explicitly differentiate from sibling tools like search_pdb_entity or search_chembl_molecule, though the context makes it obvious.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description warns that extra parameters like taxon, organism, etc. are silently dropped and should be expressed in the Solr query string instead. It also provides extensive query syntax and examples. It does not explicitly state when to use this tool vs alternatives, but the specificity suffices.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dbcls/togomcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server