search_bioentities
Search for genes and proteins using Gene Ontology data with taxonomic and source filtering to find biological entities across organisms.
Instructions
Search for bioentities (genes/proteins) using Gene Ontology data.
Searches across gene and protein names/labels with optional taxonomic filtering. Provides access to comprehensive bioentity information from GOlr.
Args: text: Text search across names and labels (e.g., "insulin", "kinase") taxon: Organism filter - accepts NCBI Taxon ID with or without prefix (e.g., "9606", "NCBITaxon:9606" for human) bioentity_type: Type filter (e.g., "protein", "gene") source: Source database filter (e.g., "UniProtKB", "MGI", "RGD") limit: Maximum number of results to return (default: 10) offset: Starting offset for pagination (default: 0)
Returns: Dictionary containing search results with bioentity information
Examples: # Search for human insulin proteins results = search_bioentities( text="insulin", taxon="9606", bioentity_type="protein" )
Notes: - Results include ID, name, type, organism, and source information - Text search covers both short names/symbols and full descriptions - Taxon IDs automatically handle NCBITaxon: prefix normalization - Use pagination for large result sets - Sources include UniProtKB, MGI, RGD, ZFIN, SGD, and others
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | No | ||
| taxon | No | ||
| bioentity_type | No | ||
| source | No | ||
| limit | No | ||
| offset | No |
Input Schema (JSON Schema)
Implementation Reference
- src/noctua_mcp/mcp_server.py:1443-1560 (handler)The MCP tool handler for 'search_bioentities'. This async function is decorated with @mcp.tool() and implements the core logic: normalizes taxon ID, uses AmigoClient to search GOlr for bioentities matching the query parameters, formats results into a structured dictionary, and handles exceptions.async def search_bioentities( text: Optional[str] = None, taxon: Optional[str] = None, bioentity_type: Optional[str] = None, source: Optional[str] = None, limit: int = 10, offset: int = 0 ) -> Dict[str, Any]: """ Search for bioentities (genes/proteins) using Gene Ontology data. Searches across gene and protein names/labels with optional taxonomic filtering. Provides access to comprehensive bioentity information from GOlr. Args: text: Text search across names and labels (e.g., "insulin", "kinase") taxon: Organism filter - accepts NCBI Taxon ID with or without prefix (e.g., "9606", "NCBITaxon:9606" for human) bioentity_type: Type filter (e.g., "protein", "gene") source: Source database filter (e.g., "UniProtKB", "MGI", "RGD") limit: Maximum number of results to return (default: 10) offset: Starting offset for pagination (default: 0) Returns: Dictionary containing search results with bioentity information Examples: # Search for human insulin proteins results = search_bioentities( text="insulin", taxon="9606", bioentity_type="protein" ) # Find mouse kinases from MGI results = search_bioentities( text="kinase", taxon="NCBITaxon:10090", source="MGI", limit=20 ) # Search for any human genes/proteins results = search_bioentities( taxon="9606", limit=50 ) # Find specific protein types results = search_bioentities( text="receptor", bioentity_type="protein", limit=25 ) # Search across all organisms results = search_bioentities(text="p53") # Pagination example page1 = search_bioentities(text="kinase", limit=10, offset=0) page2 = search_bioentities(text="kinase", limit=10, offset=10) # Common organisms: # Human: "9606" or "NCBITaxon:9606" # Mouse: "10090" or "NCBITaxon:10090" # Rat: "10116" or "NCBITaxon:10116" # Fly: "7227" or "NCBITaxon:7227" # Worm: "6239" or "NCBITaxon:6239" # Yeast: "559292" or "NCBITaxon:559292" Notes: - Results include ID, name, type, organism, and source information - Text search covers both short names/symbols and full descriptions - Taxon IDs automatically handle NCBITaxon: prefix normalization - Use pagination for large result sets - Sources include UniProtKB, MGI, RGD, ZFIN, SGD, and others """ # Normalize taxon ID - add NCBITaxon prefix if just a number if taxon and not taxon.startswith("NCBITaxon:"): if taxon.isdigit(): taxon = f"NCBITaxon:{taxon}" try: with AmigoClient() as client: results = client.search_bioentities( text=text, taxon=taxon, bioentity_type=bioentity_type, source=source, limit=limit, offset=offset ) return { "results": [ { "id": result.id, "label": result.label, "name": result.name, "type": result.type, "taxon": result.taxon, "taxon_label": result.taxon_label, "source": result.source } for result in results ], "count": len(results), "limit": limit, "offset": offset } except Exception as e: return { "error": "Failed to search bioentities", "message": str(e) }