Skip to main content
Glama

search_substance

Search the OpenFoodTox database for chemical substances by name, E-number, or description to retrieve toxicity data, safety assessments, and genotoxicity studies.

Instructions

MCP tool to search the OpenFoodTox database for substances by name, E-number, or description.

Searches the database using a two-step approach:
1. First searches SYNONYM table (E-numbers, common names, trade names, alternative names)
2. If no results, searches COMPONENT table (SUB_NAME and COM_NAME fields)

The search is case-insensitive and supports partial matches. E-numbers are automatically
normalized (e.g., "E 951" or "E-951" becomes "E951").

Returns a list of unique substances (one dictionary per SUB_COM_ID) with all study data
aggregated into arrays. Each substance may have multiple studies, opinions, and assessments,
which are grouped together by SUB_COM_ID.

Args:
    description_search: Search term (substance or component name e.g. "aspartame", OR E-number e.g. "E 951") or any of the following CAS name, Council of Europe number, E number, E.C enzyme number, EC name, EU Flavour Information System number, EUgroup-no, Flavour and Extract Manufacturers Association number, Joint FAO/WHO Expert Committee on Food Additives number, Name, OECD Toolbox Classification, Pharmalogical class, Swiss Prot no., Trade name

Returns:
    List of dictionaries, where each dictionary represents a unique substance with:
    - Basic component information (name, type, formula, description)
    - Aggregated study identifiers (arrays of IDs linking to related tables)
    - Study classifications and remarks (arrays of unique values from all studies)

    Returns None if no matches are found.

<dictionary_descriptions>
<name>SUB_COM_ID</name>
<description>Unique identifier for the substance-component link. Primary key for grouping results. Multiple studies, opinions, and assessments may reference the same SUB_COM_ID.</description>
<name>COM_NAME</name>
<description>Component name as derived in the opinions. If more than one name is reported in the opinion (excluding IUPAC name), then the most common/most specific name is reported as component name.</description>
<name>COM_TYPE</name>
<description>High level classification of component type (e.g., single, mixture, botanical, synthetic).</description>
<name>MOLECULARFORMULA</name>
<description>Molecular formula of the chemical component.</description>
<name>SUB_DESCRIPTION</name>
<description>Summary of the substance description as derived from opinions. This includes also the group description.</description>
<name>SUB_OP_CLASS</name>
<description>Array of unique values indicating the class of the substance and the corresponding opinion as provided by EFSA (e.g., "food additive", "pesticide", "flavoring"). Aggregated from all studies for this substance.</description>
<name>REMARKS</name>
<description>Array of unique remarks from all studies. Indicates the objective of the opinion and reports any general remarks as retrieved from the opinion. Aggregated from REMARKS_STUDY field.</description>
<name>GENOTOX_ID</name>
<description>Array of unique identifiers linking to the GENOTOX table. Each ID represents a genotoxicity study associated with this substance. May be None if no genotoxicity studies exist.</description>
<name>TOX_ID</name>
<description>Array of unique identifiers linking to the ENDPOINT_STUDY table. Each ID represents a toxicity endpoint study (e.g., NOAEL, LD50). May be None if no endpoint studies exist.</description>
<name>HAZARD_ID</name>
<description>Array of unique identifiers linking to the CHEM_ASSESS table. Each ID represents a chemical risk assessment (e.g., ADI, TDI values). May be None if no assessments exist.</description>
<name>OP_ID</name>
<description>Array of unique identifiers linking to the OPINION table. Each ID represents an EFSA published opinion/document associated with this substance. May be None if no opinions exist.</description>
</dictionary_descriptions>

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
description_searchYes

Implementation Reference

  • The main handler function for the 'search_substance' MCP tool. It takes a search term, normalizes E-numbers implicitly via the query, calls the database query helper, and returns the list of matching substances with aggregated study data. The docstring provides detailed input/output schema descriptions.
    def search_substance(description_search):
        """
        MCP tool to search the OpenFoodTox database for substances by name, E-number, or description.
    
        Searches the database using a two-step approach:
        1. First searches SYNONYM table (E-numbers, common names, trade names, alternative names)
        2. If no results, searches COMPONENT table (SUB_NAME and COM_NAME fields)
    
        The search is case-insensitive and supports partial matches. E-numbers are automatically
        normalized (e.g., "E 951" or "E-951" becomes "E951").
    
        Returns a list of unique substances (one dictionary per SUB_COM_ID) with all study data
        aggregated into arrays. Each substance may have multiple studies, opinions, and assessments,
        which are grouped together by SUB_COM_ID.
    
        Args:
            description_search: Search term (substance or component name e.g. "aspartame", OR E-number e.g. "E 951") or any of the following CAS name, Council of Europe number, E number, E.C enzyme number, EC name, EU Flavour Information System number, EUgroup-no, Flavour and Extract Manufacturers Association number, Joint FAO/WHO Expert Committee on Food Additives number, Name, OECD Toolbox Classification, Pharmalogical class, Swiss Prot no., Trade name
    
        Returns:
            List of dictionaries, where each dictionary represents a unique substance with:
            - Basic component information (name, type, formula, description)
            - Aggregated study identifiers (arrays of IDs linking to related tables)
            - Study classifications and remarks (arrays of unique values from all studies)
    
            Returns None if no matches are found.
    
        <dictionary_descriptions>
        <name>SUB_COM_ID</name>
        <description>Unique identifier for the substance-component link. Primary key for grouping results. Multiple studies, opinions, and assessments may reference the same SUB_COM_ID.</description>
        <name>COM_NAME</name>
        <description>Component name as derived in the opinions. If more than one name is reported in the opinion (excluding IUPAC name), then the most common/most specific name is reported as component name.</description>
        <name>COM_TYPE</name>
        <description>High level classification of component type (e.g., single, mixture, botanical, synthetic).</description>
        <name>MOLECULARFORMULA</name>
        <description>Molecular formula of the chemical component.</description>
        <name>SUB_DESCRIPTION</name>
        <description>Summary of the substance description as derived from opinions. This includes also the group description.</description>
        <name>SUB_OP_CLASS</name>
        <description>Array of unique values indicating the class of the substance and the corresponding opinion as provided by EFSA (e.g., "food additive", "pesticide", "flavoring"). Aggregated from all studies for this substance.</description>
        <name>REMARKS</name>
        <description>Array of unique remarks from all studies. Indicates the objective of the opinion and reports any general remarks as retrieved from the opinion. Aggregated from REMARKS_STUDY field.</description>
        <name>GENOTOX_ID</name>
        <description>Array of unique identifiers linking to the GENOTOX table. Each ID represents a genotoxicity study associated with this substance. May be None if no genotoxicity studies exist.</description>
        <name>TOX_ID</name>
        <description>Array of unique identifiers linking to the ENDPOINT_STUDY table. Each ID represents a toxicity endpoint study (e.g., NOAEL, LD50). May be None if no endpoint studies exist.</description>
        <name>HAZARD_ID</name>
        <description>Array of unique identifiers linking to the CHEM_ASSESS table. Each ID represents a chemical risk assessment (e.g., ADI, TDI values). May be None if no assessments exist.</description>
        <name>OP_ID</name>
        <description>Array of unique identifiers linking to the OPINION table. Each ID represents an EFSA published opinion/document associated with this substance. May be None if no opinions exist.</description>
        </dictionary_descriptions>
        """
        results = query_search_substance(description_search)
        return results
  • main.py:18-18 (registration)
    Registers the search_substance tool with the FastMCP server instance.
    mcp.add_tool(search_substance)
  • Supporting database query function that implements the core search logic: searches synonyms then components by SUB_COM_ID, fetches related studies, and aggregates data into structured dictionaries with arrays of IDs and classifications.
    def query_search_substance(description_search) -> Optional[list[dict]]:
        """
        Atomic query function.
        Database-agnostic search function.
    
        Returns unique substance/es (by SUB_COM_ID) with all study data aggregated into arrays.
        """
        normalized_search = normalize_e_number(description_search)
    
        with get_connection() as db_connection:
            # Step 1: Find SUB_COM_IDs (database-agnostic via pandas)
            synonyms = pd.read_sql_query(
                "SELECT DISTINCT SUB_COM_ID FROM synonym WHERE DESCRIPTION LIKE ?",
                db_connection,
                params=[f"%{normalized_search}%"],
            )
    
            sub_com_ids = synonyms["SUB_COM_ID"].unique().tolist()
    
            if not sub_com_ids:
                # Try component search...
                components = pd.read_sql_query(
                    "SELECT DISTINCT SUB_COM_ID FROM component WHERE SUB_NAME LIKE ? OR COM_NAME LIKE ?",
                    db_connection,
                    params=[f"%{normalized_search}%", f"%{normalized_search}%"],
                )
                if components.empty:
                    return None
                sub_com_ids = components["SUB_COM_ID"].unique().tolist()
    
            # Step 2: Get unique component info (one row per SUB_COM_ID)
            placeholders = ",".join("?" * len(sub_com_ids))
            component_query = f"""
                SELECT DISTINCT
                    SUB_COM_ID,
                    COM_NAME,
                    COM_TYPE,
                    MOLECULARFORMULA,
                    SUB_DESCRIPTION
                FROM component
                WHERE SUB_COM_ID IN ({placeholders})
            """
            components_df = pd.read_sql_query(component_query, db_connection, params=sub_com_ids)
    
            # Step 3: Get all studies for these SUB_COM_IDs
            study_query = f"""
                SELECT 
                    SUB_COM_ID,
                    SUB_OP_CLASS,
                    REMARKS_STUDY,
                    GENOTOX_ID,
                    TOX_ID,
                    HAZARD_ID,
                    OP_ID
                FROM study
                WHERE SUB_COM_ID IN ({placeholders})
            """
            studies_df = pd.read_sql_query(study_query, db_connection, params=sub_com_ids)
    
            # Step 4: Group studies by SUB_COM_ID and aggregate into arrays
            result = []
    
            for _, component_row in components_df.iterrows():
                sub_com_id = component_row["SUB_COM_ID"]
    
                # Get all studies for this component
                component_studies = studies_df[studies_df["SUB_COM_ID"] == sub_com_id]
    
                # Helper function to convert to array, filtering out None/NaN
                def to_array_or_none(series):
                    """Convert pandas series to list of non-null values, or None if empty."""
                    values = series.dropna().unique().tolist()
                    # Convert numpy types to native Python types
                    cleaned = []
                    for val in values:
                        if pd.notna(val):
                            if hasattr(val, "item"):
                                cleaned.append(val.item())
                            else:
                                cleaned.append(val)
                    return cleaned if cleaned else None
    
                # Helper function for string arrays (like SUB_OP_CLASS, REMARKS)
                def to_string_array_or_none(series):
                    """Convert pandas series to list of non-null strings, or None if empty."""
                    values = series.dropna().unique().tolist()
                    cleaned = [str(v) for v in values if pd.notna(v) and str(v).strip()]
                    return cleaned if cleaned else None
    
                # Helper to safely get value or None
                def safe_get(val):
                    """Get value if not null, else None."""
                    try:
                        if pd.isna(val):
                            return None
                        if hasattr(val, "item"):
                            return val.item()
                        return val
                    except (TypeError, ValueError):
                        return None
    
                # Build result entry
                entry = {
                    "SUB_COM_ID": int(sub_com_id),
                    "COM_NAME": safe_get(component_row["COM_NAME"]),
                    "COM_TYPE": safe_get(component_row["COM_TYPE"]),
                    "MOLECULARFORMULA": safe_get(component_row["MOLECULARFORMULA"]),
                    "SUB_DESCRIPTION": safe_get(component_row["SUB_DESCRIPTION"]),
                }
    
                if not component_studies.empty:
                    # Aggregate study fields into arrays
                    entry["SUB_OP_CLASS"] = to_string_array_or_none(component_studies["SUB_OP_CLASS"])
                    entry["REMARKS"] = to_string_array_or_none(component_studies["REMARKS_STUDY"])
                    entry["GENOTOX_ID"] = to_array_or_none(component_studies["GENOTOX_ID"])
                    entry["TOX_ID"] = to_array_or_none(component_studies["TOX_ID"])
                    entry["HAZARD_ID"] = to_array_or_none(component_studies["HAZARD_ID"])
                    entry["OP_ID"] = to_array_or_none(component_studies["OP_ID"])
                else:
                    # No studies found for this component
                    entry["SUB_OP_CLASS"] = None
                    entry["REMARKS"] = None
                    entry["GENOTOX_ID"] = None
                    entry["TOX_ID"] = None
                    entry["HAZARD_ID"] = None
                    entry["OP_ID"] = None
    
                result.append(entry)
    
            return result
  • Structured output schema description in the tool's docstring, detailing each field in the returned dictionaries.
    <dictionary_descriptions>
    <name>SUB_COM_ID</name>
    <description>Unique identifier for the substance-component link. Primary key for grouping results. Multiple studies, opinions, and assessments may reference the same SUB_COM_ID.</description>
    <name>COM_NAME</name>
    <description>Component name as derived in the opinions. If more than one name is reported in the opinion (excluding IUPAC name), then the most common/most specific name is reported as component name.</description>
    <name>COM_TYPE</name>
    <description>High level classification of component type (e.g., single, mixture, botanical, synthetic).</description>
    <name>MOLECULARFORMULA</name>
    <description>Molecular formula of the chemical component.</description>
    <name>SUB_DESCRIPTION</name>
    <description>Summary of the substance description as derived from opinions. This includes also the group description.</description>
    <name>SUB_OP_CLASS</name>
    <description>Array of unique values indicating the class of the substance and the corresponding opinion as provided by EFSA (e.g., "food additive", "pesticide", "flavoring"). Aggregated from all studies for this substance.</description>
    <name>REMARKS</name>
    <description>Array of unique remarks from all studies. Indicates the objective of the opinion and reports any general remarks as retrieved from the opinion. Aggregated from REMARKS_STUDY field.</description>
    <name>GENOTOX_ID</name>
    <description>Array of unique identifiers linking to the GENOTOX table. Each ID represents a genotoxicity study associated with this substance. May be None if no genotoxicity studies exist.</description>
    <name>TOX_ID</name>
    <description>Array of unique identifiers linking to the ENDPOINT_STUDY table. Each ID represents a toxicity endpoint study (e.g., NOAEL, LD50). May be None if no endpoint studies exist.</description>
    <name>HAZARD_ID</name>
    <description>Array of unique identifiers linking to the CHEM_ASSESS table. Each ID represents a chemical risk assessment (e.g., ADI, TDI values). May be None if no assessments exist.</description>
    <name>OP_ID</name>
    <description>Array of unique identifiers linking to the OPINION table. Each ID represents an EFSA published opinion/document associated with this substance. May be None if no opinions exist.</description>
    </dictionary_descriptions>

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/spyrosze/mcp-openfoodtox'

If you have feedback or need assistance with the MCP directory API, please join our Discord server