uniprot_search
Search the UniProt protein database using query syntax to find proteins by gene, organism, keyword, length, and other fields.
Instructions
Search the UniProt protein database using query syntax.
Args: query: UniProt query string. Examples: - gene:BRCA1 - Search by gene name - organism_id:9606 - Human proteins (NCBI taxonomy ID) - (gene:BRCA*) AND (organism_id:10090) - Mouse BRCA genes with wildcard - length:[500 TO 700] - Proteins of specific length range - keyword:kinase - By UniProt keyword - family:serpin - By protein family - ec:3.2.1.23 - By enzyme classification - database:pfam - With Pfam cross-references - reviewed:true - Only Swiss-Prot reviewed entries
Available query fields (see https://www.uniprot.org/help/query-fields):
- accession: Primary/canonical isoform accessions (e.g., accession:P62988)
- active: Active/obsolete status (e.g., active:false)
- lit_author: Reference author (e.g., lit_author:ashburner)
- protein_name: Protein name (e.g., protein_name:CD233)
- chebi: ChEBI identifier (e.g., chebi:18420)
- xrefcount_pdb: Cross-reference count (e.g., xref_count_pdb:[20 TO *])
- date_created: Creation date (e.g., date_created:[2012-10-01 TO *])
- date_modified: Last modification date (e.g., date_modified:[2012-01-01 TO 2019-03-01])
- date_sequence_modified: Sequence modification date (e.g., date_sequence_modified:[2012-01-01 TO 2012-03-01])
- database: Database cross-reference (e.g., database:pfam)
- xref: Cross-reference (e.g., xref:pdb-1aut)
- ec: Enzyme Commission number (e.g., ec:3.2.1.23)
- existence: Protein existence level (e.g., existence:3)
- family: Protein family (e.g., family:serpin)
- fragment: Fragment status (e.g., fragment:true)
- gene: Gene name (e.g., gene:HPSE)
- gene_exact: Exact gene name (e.g., gene_exact:HPSE)
- go: Gene Ontology term (e.g., go:0015629)
- virus_host_name: Virus host name
- virus_host_id: Virus host ID (e.g., virus_host_id:10090)
- accession_id: Primary accession (e.g., accession_id:P00750)
- inchikey: InChIKey identifier (e.g., inchikey:WQZGKKKJIJFFOK-GASJEMHNSA-N)
- interactor: Interacting protein (e.g., interactor:P00520)
- keyword: Keyword (e.g., keyword:toxin or keyword:KW-0800)
- length: Sequence length range (e.g., length:[500 TO 700])
- mass: Molecular mass range (e.g., mass:[500000 TO *])
- cc_mass_spectrometry: Mass spectrometry method (e.g., cc_mass_spectrometry:maldi)
- encoded_in: Gene location (e.g., encoded_in:Mitochondrion)
- organism_name: Organism name (e.g., organism_name:"Ovis aries")
- organism_id: Organism taxonomy ID (e.g., organism_id:9940)
- plasmid: Plasmid name (e.g., plasmid:ColE1)
- proteome: Proteome ID (e.g., proteome:UP000005640)
- proteomecomponent: Proteome component (e.g., proteomecomponent:"chromosome 1")
- sec_acc: Secondary accession (e.g., sec_acc:P02023)
- reviewed: Reviewed status (e.g., reviewed:true)
- scope: Reference scope (e.g., scope:mutagenesis)
- sequence: Sequence identifier (e.g., accession:P05067-9 AND is_isoform:true)
- strain: Organism strain (e.g., strain:wistar)
- taxonomy_name: Taxonomy name (e.g., taxonomy_name:mammal)
- taxonomy_id: Taxonomy ID (e.g., taxonomy_id:40674)
- tissue: Tissue type (e.g., tissue:liver)
- cc_webresource: Web resource (e.g., cc_webresource:wikipedia)
database: UniProt database to search. One of: uniprotkb (default), uniparc, uniref
limit: Maximum number of results per page (1-100, default 10)
fields: Optional list of return fields to include. If not specified, all fields
are returned. Available return fields (see https://www.uniprot.org/help/return_fields):
Names & Taxonomy:
- accession, id, gene_names, gene_primary, gene_synonym, gene_oln, gene_orf
- organism_name, organism_id, protein_name, xref_proteomes
- lineage, lineage_ids, virus_hosts
Sequences:
- cc_alternative_products, ft_var_seq, cc_sc_epred, fragment, encoded_in
- length, mass, cc_mass_spectrometry, ft_variant, ft_non_cons, ft_non_std
- ft_non_ter, cc_polymorphism, cc_rna_editing, sequence, cc_sequence_caution
- ft_conflict, ft_unsure, sequence_version
Function:
- absorption, ft_act_site, cc_activity_regulation, ft_binding, cc_catalytic_activity
- cc_cofactor, ft_dna_bind, ec, cc_function, kinetics, cc_pathway
- ph_dependence, redox_potential, rhea, ft_site, temp_dependence
Miscellaneous:
- annotation_score, cc_caution, comment_count, feature_count, keywordid, keyword
- cc_miscellaneous, protein_existence, reviewed, tools, uniparc_id
Interaction:
- cc_interaction, cc_subunit
Expression:
- cc_developmental_stage, cc_induction, cc_tissue_specificity
Gene Ontology (GO):
- go_p, go_c, go, go_f, go_id
Pathology & Biotech:
- cc_allergen, cc_biotechnology, cc_disruption_phenotype, cc_disease
- ft_mutagen, cc_pharmaceutical, cc_toxic_dose
Subcellular location:
- ft_intramem, cc_subcellular_location, ft_topo_dom, ft_transmem
PTM / Processing:
- ft_chain, ft_crosslnk, ft_disulfid, ft_carbohyd, ft_init_met, ft_lipid
- ft_mod_res, ft_peptide, cc_ptm, ft_propep, ft_signal, ft_transit
Structure:
- structure_3d, ft_strand, ft_helix, ft_turn
Publications:
- lit_pubmed_id
Date:
- date_created, date_modified, date_sequence_modified, version
Family & Domains:
- ft_coiled, ft_compbias, cc_domain, ft_domain, ft_motif, protein_families
- ft_region, ft_repeat, ft_zn_fing
Cross-references:
- See https://www.uniprot.org/help/return_fields for cross-reference fields
cursor: Pagination cursor from a previous search result's 'nextCursor' field.
Pass this to retrieve the next page of results.
response_format: Response format. One of: 'json' (default) or 'toon'.
- 'json': Returns response in JSON format
- 'toon': Returns response in TOON formatReturns: When response_format='json': JSON object with: - results: Array of matching protein entries - total: Total number of matching entries (if available) - nextCursor: Cursor string for retrieving the next page (if more results exist)
When response_format='toon': TOON-formatted string with:
- results: Array of matching protein entries
- total: Total number of matching entries (if available)
- nextCursor: Cursor string for retrieving the next page (if more results exist)See https://www.uniprot.org/help/query-fields for full query syntax documentation.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| database | No | uniprotkb | |
| limit | No | ||
| fields | No | ||
| cursor | No | ||
| response_format | No | json |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |