search_substance
Search the OpenFoodTox database for chemical substances by name, E-number, or description to retrieve toxicity data, safety assessments, and genotoxicity studies.
Instructions
MCP tool to search the OpenFoodTox database for substances by name, E-number, or description.
Searches the database using a two-step approach:
1. First searches SYNONYM table (E-numbers, common names, trade names, alternative names)
2. If no results, searches COMPONENT table (SUB_NAME and COM_NAME fields)
The search is case-insensitive and supports partial matches. E-numbers are automatically
normalized (e.g., "E 951" or "E-951" becomes "E951").
Returns a list of unique substances (one dictionary per SUB_COM_ID) with all study data
aggregated into arrays. Each substance may have multiple studies, opinions, and assessments,
which are grouped together by SUB_COM_ID.
Args:
description_search: Search term (substance or component name e.g. "aspartame", OR E-number e.g. "E 951") or any of the following CAS name, Council of Europe number, E number, E.C enzyme number, EC name, EU Flavour Information System number, EUgroup-no, Flavour and Extract Manufacturers Association number, Joint FAO/WHO Expert Committee on Food Additives number, Name, OECD Toolbox Classification, Pharmalogical class, Swiss Prot no., Trade name
Returns:
List of dictionaries, where each dictionary represents a unique substance with:
- Basic component information (name, type, formula, description)
- Aggregated study identifiers (arrays of IDs linking to related tables)
- Study classifications and remarks (arrays of unique values from all studies)
Returns None if no matches are found.
<dictionary_descriptions>
<name>SUB_COM_ID</name>
<description>Unique identifier for the substance-component link. Primary key for grouping results. Multiple studies, opinions, and assessments may reference the same SUB_COM_ID.</description>
<name>COM_NAME</name>
<description>Component name as derived in the opinions. If more than one name is reported in the opinion (excluding IUPAC name), then the most common/most specific name is reported as component name.</description>
<name>COM_TYPE</name>
<description>High level classification of component type (e.g., single, mixture, botanical, synthetic).</description>
<name>MOLECULARFORMULA</name>
<description>Molecular formula of the chemical component.</description>
<name>SUB_DESCRIPTION</name>
<description>Summary of the substance description as derived from opinions. This includes also the group description.</description>
<name>SUB_OP_CLASS</name>
<description>Array of unique values indicating the class of the substance and the corresponding opinion as provided by EFSA (e.g., "food additive", "pesticide", "flavoring"). Aggregated from all studies for this substance.</description>
<name>REMARKS</name>
<description>Array of unique remarks from all studies. Indicates the objective of the opinion and reports any general remarks as retrieved from the opinion. Aggregated from REMARKS_STUDY field.</description>
<name>GENOTOX_ID</name>
<description>Array of unique identifiers linking to the GENOTOX table. Each ID represents a genotoxicity study associated with this substance. May be None if no genotoxicity studies exist.</description>
<name>TOX_ID</name>
<description>Array of unique identifiers linking to the ENDPOINT_STUDY table. Each ID represents a toxicity endpoint study (e.g., NOAEL, LD50). May be None if no endpoint studies exist.</description>
<name>HAZARD_ID</name>
<description>Array of unique identifiers linking to the CHEM_ASSESS table. Each ID represents a chemical risk assessment (e.g., ADI, TDI values). May be None if no assessments exist.</description>
<name>OP_ID</name>
<description>Array of unique identifiers linking to the OPINION table. Each ID represents an EFSA published opinion/document associated with this substance. May be None if no opinions exist.</description>
</dictionary_descriptions>
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| description_search | Yes |
Implementation Reference
- The main handler function for the 'search_substance' MCP tool. It takes a search term, normalizes E-numbers implicitly via the query, calls the database query helper, and returns the list of matching substances with aggregated study data. The docstring provides detailed input/output schema descriptions.def search_substance(description_search): """ MCP tool to search the OpenFoodTox database for substances by name, E-number, or description. Searches the database using a two-step approach: 1. First searches SYNONYM table (E-numbers, common names, trade names, alternative names) 2. If no results, searches COMPONENT table (SUB_NAME and COM_NAME fields) The search is case-insensitive and supports partial matches. E-numbers are automatically normalized (e.g., "E 951" or "E-951" becomes "E951"). Returns a list of unique substances (one dictionary per SUB_COM_ID) with all study data aggregated into arrays. Each substance may have multiple studies, opinions, and assessments, which are grouped together by SUB_COM_ID. Args: description_search: Search term (substance or component name e.g. "aspartame", OR E-number e.g. "E 951") or any of the following CAS name, Council of Europe number, E number, E.C enzyme number, EC name, EU Flavour Information System number, EUgroup-no, Flavour and Extract Manufacturers Association number, Joint FAO/WHO Expert Committee on Food Additives number, Name, OECD Toolbox Classification, Pharmalogical class, Swiss Prot no., Trade name Returns: List of dictionaries, where each dictionary represents a unique substance with: - Basic component information (name, type, formula, description) - Aggregated study identifiers (arrays of IDs linking to related tables) - Study classifications and remarks (arrays of unique values from all studies) Returns None if no matches are found. <dictionary_descriptions> <name>SUB_COM_ID</name> <description>Unique identifier for the substance-component link. Primary key for grouping results. Multiple studies, opinions, and assessments may reference the same SUB_COM_ID.</description> <name>COM_NAME</name> <description>Component name as derived in the opinions. If more than one name is reported in the opinion (excluding IUPAC name), then the most common/most specific name is reported as component name.</description> <name>COM_TYPE</name> <description>High level classification of component type (e.g., single, mixture, botanical, synthetic).</description> <name>MOLECULARFORMULA</name> <description>Molecular formula of the chemical component.</description> <name>SUB_DESCRIPTION</name> <description>Summary of the substance description as derived from opinions. This includes also the group description.</description> <name>SUB_OP_CLASS</name> <description>Array of unique values indicating the class of the substance and the corresponding opinion as provided by EFSA (e.g., "food additive", "pesticide", "flavoring"). Aggregated from all studies for this substance.</description> <name>REMARKS</name> <description>Array of unique remarks from all studies. Indicates the objective of the opinion and reports any general remarks as retrieved from the opinion. Aggregated from REMARKS_STUDY field.</description> <name>GENOTOX_ID</name> <description>Array of unique identifiers linking to the GENOTOX table. Each ID represents a genotoxicity study associated with this substance. May be None if no genotoxicity studies exist.</description> <name>TOX_ID</name> <description>Array of unique identifiers linking to the ENDPOINT_STUDY table. Each ID represents a toxicity endpoint study (e.g., NOAEL, LD50). May be None if no endpoint studies exist.</description> <name>HAZARD_ID</name> <description>Array of unique identifiers linking to the CHEM_ASSESS table. Each ID represents a chemical risk assessment (e.g., ADI, TDI values). May be None if no assessments exist.</description> <name>OP_ID</name> <description>Array of unique identifiers linking to the OPINION table. Each ID represents an EFSA published opinion/document associated with this substance. May be None if no opinions exist.</description> </dictionary_descriptions> """ results = query_search_substance(description_search) return results
- main.py:18-18 (registration)Registers the search_substance tool with the FastMCP server instance.mcp.add_tool(search_substance)
- Supporting database query function that implements the core search logic: searches synonyms then components by SUB_COM_ID, fetches related studies, and aggregates data into structured dictionaries with arrays of IDs and classifications.def query_search_substance(description_search) -> Optional[list[dict]]: """ Atomic query function. Database-agnostic search function. Returns unique substance/es (by SUB_COM_ID) with all study data aggregated into arrays. """ normalized_search = normalize_e_number(description_search) with get_connection() as db_connection: # Step 1: Find SUB_COM_IDs (database-agnostic via pandas) synonyms = pd.read_sql_query( "SELECT DISTINCT SUB_COM_ID FROM synonym WHERE DESCRIPTION LIKE ?", db_connection, params=[f"%{normalized_search}%"], ) sub_com_ids = synonyms["SUB_COM_ID"].unique().tolist() if not sub_com_ids: # Try component search... components = pd.read_sql_query( "SELECT DISTINCT SUB_COM_ID FROM component WHERE SUB_NAME LIKE ? OR COM_NAME LIKE ?", db_connection, params=[f"%{normalized_search}%", f"%{normalized_search}%"], ) if components.empty: return None sub_com_ids = components["SUB_COM_ID"].unique().tolist() # Step 2: Get unique component info (one row per SUB_COM_ID) placeholders = ",".join("?" * len(sub_com_ids)) component_query = f""" SELECT DISTINCT SUB_COM_ID, COM_NAME, COM_TYPE, MOLECULARFORMULA, SUB_DESCRIPTION FROM component WHERE SUB_COM_ID IN ({placeholders}) """ components_df = pd.read_sql_query(component_query, db_connection, params=sub_com_ids) # Step 3: Get all studies for these SUB_COM_IDs study_query = f""" SELECT SUB_COM_ID, SUB_OP_CLASS, REMARKS_STUDY, GENOTOX_ID, TOX_ID, HAZARD_ID, OP_ID FROM study WHERE SUB_COM_ID IN ({placeholders}) """ studies_df = pd.read_sql_query(study_query, db_connection, params=sub_com_ids) # Step 4: Group studies by SUB_COM_ID and aggregate into arrays result = [] for _, component_row in components_df.iterrows(): sub_com_id = component_row["SUB_COM_ID"] # Get all studies for this component component_studies = studies_df[studies_df["SUB_COM_ID"] == sub_com_id] # Helper function to convert to array, filtering out None/NaN def to_array_or_none(series): """Convert pandas series to list of non-null values, or None if empty.""" values = series.dropna().unique().tolist() # Convert numpy types to native Python types cleaned = [] for val in values: if pd.notna(val): if hasattr(val, "item"): cleaned.append(val.item()) else: cleaned.append(val) return cleaned if cleaned else None # Helper function for string arrays (like SUB_OP_CLASS, REMARKS) def to_string_array_or_none(series): """Convert pandas series to list of non-null strings, or None if empty.""" values = series.dropna().unique().tolist() cleaned = [str(v) for v in values if pd.notna(v) and str(v).strip()] return cleaned if cleaned else None # Helper to safely get value or None def safe_get(val): """Get value if not null, else None.""" try: if pd.isna(val): return None if hasattr(val, "item"): return val.item() return val except (TypeError, ValueError): return None # Build result entry entry = { "SUB_COM_ID": int(sub_com_id), "COM_NAME": safe_get(component_row["COM_NAME"]), "COM_TYPE": safe_get(component_row["COM_TYPE"]), "MOLECULARFORMULA": safe_get(component_row["MOLECULARFORMULA"]), "SUB_DESCRIPTION": safe_get(component_row["SUB_DESCRIPTION"]), } if not component_studies.empty: # Aggregate study fields into arrays entry["SUB_OP_CLASS"] = to_string_array_or_none(component_studies["SUB_OP_CLASS"]) entry["REMARKS"] = to_string_array_or_none(component_studies["REMARKS_STUDY"]) entry["GENOTOX_ID"] = to_array_or_none(component_studies["GENOTOX_ID"]) entry["TOX_ID"] = to_array_or_none(component_studies["TOX_ID"]) entry["HAZARD_ID"] = to_array_or_none(component_studies["HAZARD_ID"]) entry["OP_ID"] = to_array_or_none(component_studies["OP_ID"]) else: # No studies found for this component entry["SUB_OP_CLASS"] = None entry["REMARKS"] = None entry["GENOTOX_ID"] = None entry["TOX_ID"] = None entry["HAZARD_ID"] = None entry["OP_ID"] = None result.append(entry) return result
- Structured output schema description in the tool's docstring, detailing each field in the returned dictionaries.<dictionary_descriptions> <name>SUB_COM_ID</name> <description>Unique identifier for the substance-component link. Primary key for grouping results. Multiple studies, opinions, and assessments may reference the same SUB_COM_ID.</description> <name>COM_NAME</name> <description>Component name as derived in the opinions. If more than one name is reported in the opinion (excluding IUPAC name), then the most common/most specific name is reported as component name.</description> <name>COM_TYPE</name> <description>High level classification of component type (e.g., single, mixture, botanical, synthetic).</description> <name>MOLECULARFORMULA</name> <description>Molecular formula of the chemical component.</description> <name>SUB_DESCRIPTION</name> <description>Summary of the substance description as derived from opinions. This includes also the group description.</description> <name>SUB_OP_CLASS</name> <description>Array of unique values indicating the class of the substance and the corresponding opinion as provided by EFSA (e.g., "food additive", "pesticide", "flavoring"). Aggregated from all studies for this substance.</description> <name>REMARKS</name> <description>Array of unique remarks from all studies. Indicates the objective of the opinion and reports any general remarks as retrieved from the opinion. Aggregated from REMARKS_STUDY field.</description> <name>GENOTOX_ID</name> <description>Array of unique identifiers linking to the GENOTOX table. Each ID represents a genotoxicity study associated with this substance. May be None if no genotoxicity studies exist.</description> <name>TOX_ID</name> <description>Array of unique identifiers linking to the ENDPOINT_STUDY table. Each ID represents a toxicity endpoint study (e.g., NOAEL, LD50). May be None if no endpoint studies exist.</description> <name>HAZARD_ID</name> <description>Array of unique identifiers linking to the CHEM_ASSESS table. Each ID represents a chemical risk assessment (e.g., ADI, TDI values). May be None if no assessments exist.</description> <name>OP_ID</name> <description>Array of unique identifiers linking to the OPINION table. Each ID represents an EFSA published opinion/document associated with this substance. May be None if no opinions exist.</description> </dictionary_descriptions>