bc_query_kegg
Query biological data using the KEGG API to retrieve pathways, genes, compounds, and disease insights. Perform operations like search, conversion, and linking to access comprehensive KEGG database information.
Instructions
Execute a KEGG API query.
This function provides access to the KEGG API, allowing you to query biological data across pathways, genes, compounds, diseases, and more. The function can perform all KEGG API operations and accepts various parameters depending on the operation.
When searching for genes in KEGG, you typically need KEGG IDs rather than gene symbols. Use the get_kegg_id_by_gene_symbol function first to convert gene symbols to KEGG IDs.
Common operations:
info: Get database metadata (e.g., operation=info, database=PATHWAY)
list: List entries in a database (e.g., operation=list, database=PATHWAY, query="hsa")
get: Retrieve specific entries (e.g., operation=get, entries=["hsa:7157"])
find: Search for entries by keyword (e.g., operation=find, database=COMPOUND, query="glucose")
link: Find related entries (e.g., operation=link, target_db=PATHWAY, entries=["hsa:7157"])
conv: Convert between identifiers (e.g., operation=conv, target_db=NCBI_GENEID, entries=["hsa:7157"])
Args: operation (KeggOperation): The KEGG operation to perform. database (KeggDatabase | KeggOutsideDb | str, optional): The database to query. target_db (KeggDatabase | KeggOutsideDb | str, optional): The target database for conversion. source_db (KeggDatabase | KeggOutsideDb | str, optional): The source database for conversion. query (str, optional): The query string for FIND or LIST operations. option (KeggOption | KeggFindOption | KeggRdfFormat, optional): Additional options for the operation. entries (List[str], optional): List of entries for GET or LINK operations.
Returns: str | dict: The result of the KEGG query or an error message.
Examples: # List human pathways >>> query_kegg(operation=KeggOperation.LIST, database=KeggDatabase.PATHWAY, query="hsa")
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| database | No | The KEGG database to query (e.g., pathway, genes, compound) or organism code (e.g., hsa) | |
| entries | No | List of KEGG entry IDs (e.g., ['hsa:7157', 'hsa:00010']) | |
| operation | Yes | The KEGG API operation to perform (info, list, find, get, conv, link, ddi) | |
| option | No | Additional options like sequence formats, chemical formula search, etc. | |
| query | No | Query string for operations like FIND, or organism code for LIST | |
| source_db | No | Source database for conversion or linking operations | |
| target_db | No | Target database for conversion or linking operations |
Implementation Reference
- The primary handler function for the bc_query_kegg tool. It constructs a KeggConfig, validates it, and executes the KEGG API query via the helper function.@core_mcp.tool() def query_kegg( operation: Annotated[KeggOperation, Field(description="info, list, find, get, conv, link, or ddi")], database: Annotated[ Optional[Union[KeggDatabase, KeggOutsideDb, str]], Field(description="pathway, compound, genes, organism code (hsa, mmu, etc.), or other DB"), ] = None, target_db: Annotated[ Optional[Union[KeggDatabase, KeggOutsideDb, str]], Field(description="Target DB for conversion/linking operations"), ] = None, source_db: Annotated[ Optional[Union[KeggDatabase, KeggOutsideDb, str]], Field(description="Source DB for conversion/linking operations"), ] = None, query: Annotated[Optional[str], Field(description="Query string for FIND/LIST, or organism code for LIST")] = None, option: Annotated[ Optional[Union[KeggOption, KeggFindOption, KeggRdfFormat]], Field(description="aaseq, ntseq, mol, formula, exact_mass, mol_weight, etc."), ] = None, entries: Annotated[ Optional[List[str]], Field(description="KEGG entry IDs (e.g., ['hsa:7157', 'hsa00010'])") ] = None, ) -> str | dict: """Execute flexible KEGG API queries across pathways, genes, compounds, diseases, drugs. Use get_kegg_id_by_gene_symbol() first. Returns: str or dict: Raw text response from KEGG API with requested data (pathways, genes, compounds, etc.) or error dict. """ config = KeggConfig( operation=operation, database=database, target_db=target_db, source_db=source_db, query=query, option=option, entries=entries or [], ) try: KeggConfig.model_validate(config) except ValueError as e: return {"error": f"Invalid configuration: {e}"} try: return config.execute() except Exception as e: return {"error": f"Failed to execute KEGG query: {e}"}
- Pydantic models and enums defining the input schema and validation for KEGG query parameters.class KeggOperation(str, Enum): """KEGG API operations. These operations correspond to the different API endpoints in the KEGG REST API. For detailed information on each operation, see: https://www.kegg.jp/kegg/rest/keggapi.html """ INFO = "info" # Display database release information LIST = "list" # Obtain a list of entry identifiers FIND = "find" # Find entries with matching keywords GET = "get" # Retrieve given database entries CONV = "conv" # Convert between KEGG and outside database identifiers LINK = "link" # Find related entries by cross-references DDI = "ddi" # Find adverse drug-drug interactions class KeggDatabase(str, Enum): """Primary KEGG databases. These databases contain different types of biological data in the KEGG system. Pathway and pathway-related databases: pathway, brite, module, ko Genes and genomes: genes, genome (organism-specific databases use KEGG organism codes) Chemical compounds: compound, glycan, reaction, rclass, enzyme Disease, drugs, and variants: variant, disease, drug, dgroup """ KEGG = "kegg" # All KEGG databases combined PATHWAY = "pathway" # KEGG pathway maps BRITE = "brite" # BRITE functional hierarchies MODULE = "module" # KEGG modules ORTHOLOGY = "ko" # KEGG orthology GENOME = "genome" # KEGG organisms COMPOUND = "compound" # Chemical compounds GLYCAN = "glycan" # Glycans REACTION = "reaction" # Biochemical reactions RCLASS = "rclass" # Reaction classes ENZYME = "enzyme" # Enzyme nomenclature NETWORK = "network" # Network elements VARIANT = "variant" # Human gene variants DISEASE = "disease" # Human diseases DRUG = "drug" # Drugs DGROUP = "dgroup" # Drug groups GENES = "genes" # Genes in KEGG organisms (composite database) LIGAND = "ligand" # Collection of chemical databases ORGANISM = "organism" # Special case for list operation to get organism codes class KeggOutsideDb(str, Enum): """Outside databases integrated in KEGG. These external databases can be used in CONV (conversion) and LINK operations. """ PUBMED = "pubmed" # PubMed literature database NCBI_GENEID = "ncbi-geneid" # NCBI Gene IDs NCBI_PROTEINID = "ncbi-proteinid" # NCBI Protein IDs UNIPROT = "uniprot" # UniProt protein database PUBCHEM = "pubchem" # PubChem compound database CHEBI = "chebi" # Chemical Entities of Biological Interest ATC = "atc" # Anatomical Therapeutic Chemical Classification System JTC = "jtc" # Japanese therapeutic category NDC = "ndc" # National Drug Code (USA) YJ = "yj" # YJ codes (Japan drug products) YK = "yk" # Part of Korosho code (Japan) class KeggOption(str, Enum): """Options for GET operation. These options specify the format or type of data to retrieve for database entries. """ AASEQ = "aaseq" # Amino acid sequence NTSEQ = "ntseq" # Nucleotide sequence MOL = "mol" # Chemical structure in MOL format KCF = "kcf" # Chemical structure in KCF format IMAGE = "image" # Image file (pathway maps, compound structures) CONF = "conf" # Configuration file (for pathway maps) KGML = "kgml" # KEGG Markup Language file (for pathway maps) JSON = "json" # JSON format (for brite hierarchies) class KeggFindOption(str, Enum): """Options for FIND operation on compounds/drugs. These options specify the search criteria for chemical compounds and drugs. """ FORMULA = "formula" # Search by chemical formula EXACT_MASS = "exact_mass" # Search by exact mass MOL_WEIGHT = "mol_weight" # Search by molecular weight NOP = "nop" # No processing (literal search) class KeggRdfFormat(str, Enum): """RDF output formats for LINK with RDF option. These options specify the format of returned RDF data. """ TURTLE = "turtle" # Turtle RDF format N_TRIPLE = "n-triple" # N-Triples RDF format
- Helper function that performs the actual HTTP request to the KEGG REST API.def execute_kegg_query(path: str) -> str: """Internal helper - executes the HTTP GET and returns raw text.""" base = "https://rest.kegg.jp" url = f"{base}/{path.lstrip('/')}" r = requests.get(url, timeout=30.0) r.raise_for_status() return r.text
- src/biocontext_kb/core/__init__.py:23-24 (registration)Import statement that conditionally loads KEGG tools, including query_kegg, into core_mcp for registration.if os.getenv("MCP_ENVIRONMENT") != "PRODUCTION" or os.getenv("MCP_INCLUDE_KEGG", "false").lower() == "true": from .kegg import *
- src/biocontext_kb/core/kegg/__init__.py:2-2 (registration)Re-exports the query_kegg function, ensuring it is imported and registered when kegg module is loaded.from ._query_kegg import query_kegg