PubChem MCP Server

project-spec.md•15.2 KiB

## **Project Specification: `pubchem-mcp-server` (Production Build v2)** ### 1. Overview This document outlines the project specification for the `pubchem-mcp-server`, a production-grade Model Context Protocol (MCP) server. Built using the `mcp-ts-template`, this server will provide a comprehensive, robust, and AI-agent-friendly interface to the PubChem PUG REST API. The server's design will strictly adhere to the architectural patterns and best practices demonstrated in the `pubmed-mcp-server`, including the `verb_domain_noun` tool naming convention, modular design, and a focus on creating powerful, composable tools for complex scientific workflows. The server will enable autonomous agents to systematically search, fetch, and analyze chemical, substance, and bioassay data from PubChem. ### 2. Project Goals - **Production-Ready Implementation:** Deliver a stable, reliable, and secure server suitable for production use, leveraging all relevant features of the `mcp-ts-template` (logging, error handling, security, configuration). - **Agent-Centric Design:** Create a suite of MCP Tools with clear, predictable, and strictly-typed schemas and descriptive names that are intuitive for an LLM agent to discover and use effectively. - **Comprehensive API Coverage:** Implement tools that cover the core functionalities of the PubChem API, including the `Compound`, `Substance`, and `Assay` domains, as well as advanced search and cross-referencing capabilities. - **Robustness and Compliance:** Strictly enforce PubChem's API usage policy (5 requests/second) through a built-in rate limiter. Implement comprehensive error handling and input validation to gracefully manage API-specific issues. - **Extensibility:** Build a modular foundation that allows for easy addition of new tools and capabilities as the server evolves. ### 3. Tool Naming Convention All tools will follow the `verb_domain_noun` naming convention established by the `pubmed-mcp-server` to ensure consistency and predictability for agents interacting with multiple servers from this ecosystem. - **`search_*`**: For discovering identifiers based on a query. - **`fetch_*`**: For retrieving detailed data for a known identifier. - **`get_*`**: For retrieving specific attributes or related information. ### 4. MCP Tool Suite Specification The server will be organized into logical tool groups. #### Group 1: Core Compound Tools These tools provide the fundamental capabilities for working with chemical compounds. - **`search_compound_by_identifier`** - **Description:** Searches for a PubChem Compound ID (CID) using a common chemical identifier like a name, SMILES string, or InChIKey. This is the primary entry point for most compound-related workflows. - **PubChem API:** `.../compound/<namespace>/<identifier>/cids/JSON` - **Input (Zod):** ``` z.object({ identifierType: z.enum(['name', 'smiles', 'inchikey']).describe("The type of identifier provided."), identifier: z.string().describe("The identifier string (e.g., 'aspirin', 'CC(=O)Oc1ccccc1C(=O)O').") }) ``` - **Output (Zod):** ``` z.object({ cids: z.array(z.number().int()).describe("A list of matching PubChem Compound IDs (CIDs). Often a single result.") }) ``` - **`fetch_compound_properties`** - **Description:** Fetches a list of specified physicochemical properties for one or more compound CIDs. Supports batch requests. - **PubChem API:** `.../compound/cid/<cids>/property/<properties>/JSON` - **Input (Zod):** ``` z.object({ cids: z.array(z.number().int().positive()).min(1).describe("An array of one or more CIDs to fetch properties for."), properties: z.array(z.enum([ 'MolecularFormula', 'MolecularWeight', 'InChI', 'InChIKey', 'IUPACName', 'Title', 'XLogP', 'ExactMass', 'MonoisotopicMass', 'TPSA', 'Complexity', 'Charge', 'HBondDonorCount', 'HBondAcceptorCount', 'RotatableBondCount', 'HeavyAtomCount', 'CovalentUnitCount' ])).min(1).describe("A list of properties to retrieve.") }) ``` - **Output (Zod):** ``` // A highly specific schema for predictable agent consumption. const CompoundPropertiesSchema = z.object({ CID: z.number().int(), MolecularFormula: z.string().optional(), MolecularWeight: z.number().optional(), InChI: z.string().optional(), InChIKey: z.string().optional(), IUPACName: z.string().optional(), Title: z.string().optional(), XLogP: z.number().optional(), ExactMass: z.number().optional(), MonoisotopicMass: z.number().optional(), TPSA: z.number().optional(), Complexity: z.number().optional(), Charge: z.number().optional(), HBondDonorCount: z.number().int().optional(), HBondAcceptorCount: z.number().int().optional(), RotatableBondCount: z.number().int().optional(), HeavyAtomCount: z.number().int().optional(), CovalentUnitCount: z.number().int().optional(), }).describe("An object containing the requested properties for a single compound."); z.object({ results: z.array(CompoundPropertiesSchema) .describe("A list of property results, one for each requested CID.") }) ``` - **`get_compound_image_url`** - **Description:** Generates and returns a direct URL to a 2D image of a compound's structure. Ideal for multimodal agents. - **PubChem API:** (Constructs URL) `.../compound/cid/<cid>/PNG` - **Input (Zod):** ``` z.object({ cid: z.number().int().positive().describe("The CID of the compound."), size: z.enum(['small', 'large']).optional().default('large').describe("Image size: 'small' (100x100) or 'large' (300x300).") }) ``` - **Output (Zod):** ``` z.object({ cid: z.number().int(), imageUrl: z.string().url().describe("A direct URL to the PNG image of the compound's structure.") }) ``` #### Group 2: Advanced Compound Search Tools These tools enable more complex, structure-based discovery. - **`search_compounds_by_structure`** - **Description:** Performs a structural search (substructure, superstructure, or identity) based on a query structure provided as a SMILES string or an existing CID. - **PubChem API:** `.../compound/{substructure|superstructure|identity}/{smiles|cid}/<query>/cids/JSON` - **Input (Zod):** ``` z.object({ searchType: z.enum(['substructure', 'superstructure', 'identity']).describe("The type of structural search to perform."), query: z.string().describe("The query structure, provided as a SMILES string or a CID (e.g., 'c1ccccc1' or '2244')."), queryType: z.enum(['smiles', 'cid']).describe("The format of the provided query."), maxRecords: z.number().int().positive().max(100).optional().default(20).describe("Maximum number of matching CIDs to return.") }) ``` - **Output (Zod):** `z.object({ cids: z.array(z.number().int()) })` - **`search_compounds_by_similarity`** - **Description:** Finds compounds with a similar 2D structure to a query compound (SMILES or CID). - **PubChem API:** `.../compound/fastsimilarity_2d/{smiles|cid}/<query>/cids/JSON` - **Input (Zod):** ``` z.object({ query: z.string().describe("The query structure as a SMILES string or a CID."), queryType: z.enum(['smiles', 'cid']).describe("The format of the provided query."), threshold: z.number().min(70).max(100).optional().default(90).describe("Minimum Tanimoto similarity score (70-100)."), maxRecords: z.number().int().positive().max(100).optional().default(20).describe("Maximum number of similar CIDs to return.") }) ``` - **Output (Zod):** `z.object({ cids: z.array(z.number().int()) })` - **`search_compounds_by_formula`** - **Description:** Finds compounds matching a given molecular formula. - **PubChem API:** `.../compound/fastformula/<formula>/cids/JSON` - **Implementation Note:** The regex should be defined as a named constant for maintainability (e.g., `MOLECULAR_FORMULA_REGEX`). - **Input (Zod):** ``` const MOLECULAR_FORMULA_REGEX = /^[A-Z][a-z]?\d*([A-Z][a-z]?\d*)*$/; z.object({ formula: z.string().regex(MOLECULAR_FORMULA_REGEX).describe("A valid molecular formula (e.g., 'C6H12O6')."), allowOtherElements: z.boolean().optional().default(false).describe("If true, allows compounds containing additional elements."), maxRecords: z.number().int().positive().max(100).optional().default(50).describe("Maximum number of matching CIDs to return.") }) ``` - **Output (Zod):** `z.object({ cids: z.array(z.number().int()) })` #### Group 3: Substance & Assay Tools These tools provide access to deposited substance data and biological assay results. - **`fetch_substance_details`** - **Description:** Retrieves details for a given PubChem Substance ID (SID), including its synonyms, source, and any associated CIDs. - **PubChem API:** `.../substance/sid/<sid>/JSON` and `.../substance/sid/<sid>/cids/JSON` - **Input (Zod):** `z.object({ sid: z.number().int().positive().describe("The PubChem Substance ID (SID).") })` - **Output (Zod):** ``` z.object({ sid: z.number().int(), sourceName: z.string().describe("The name of the depositor/source."), depositionDate: z.string().describe("The date the substance was deposited."), modificationDate: z.string().describe("The date the substance was last modified."), synonyms: z.array(z.string()).describe("A list of synonyms for the substance."), relatedCids: z.array(z.number().int()).describe("A list of standardized Compound IDs (CIDs) related to this substance.") }) ``` - **`fetch_assay_summary`** - **Description:** Retrieves the summary for a given PubChem BioAssay ID (AID), including its name, description, source, and targets. - **PubChem API:** `.../assay/aid/<aid>/summary/JSON` - **Input (Zod):** `z.object({ aid: z.number().int().positive().describe("The PubChem BioAssay ID (AID).") })` - **Output (Zod):** ``` z.object({ aid: z.number().int(), name: z.string(), description: z.string(), sourceName: z.string(), numSids: z.number().int(), numActive: z.number().int(), targets: z.array(z.object({ name: z.string(), geneId: z.number().int().optional(), geneSymbol: z.string().optional() })).describe("Biological targets of the assay.") }) ``` - **`search_assays_by_target`** - **Description:** Finds BioAssay IDs (AIDs) associated with a specific biological target, such as a gene symbol or protein name. - **PubChem API:** `.../assay/target/{genesymbol|proteinname}/<query>/aids/JSON` - **Input (Zod):** ``` z.object({ targetType: z.enum(['genesymbol', 'proteinname']).describe("The type of biological target."), targetQuery: z.string().describe("The target identifier (e.g., 'EGFR', 'Epidermal growth factor receptor').") }) ``` - **Output (Zod):** `z.object({ aids: z.array(z.number().int()) })` #### Group 4: Cross-Reference (XRef) Tool This tool bridges PubChem data with other databases. - **`fetch_compound_xrefs`** - **Description:** Fetches external cross-references (XRefs) for a given CID. Returns identifiers that can be used with other services (e.g., a PubMed ID for use with `pubmed-mcp-server`). - **PubChem API:** `.../compound/cid/<cid>/xrefs/<xref_types>/JSON` - **Input (Zod):** ``` z.object({ cid: z.number().int().positive().describe("The CID to find cross-references for."), xrefTypes: z.array(z.enum([ 'RegistryID', 'RN', 'PubMedID', 'PatentID', 'GeneID', 'ProteinGI', 'TaxonomyID' ])).min(1).describe("A list of cross-reference types to retrieve.") }) ``` - **Output (Zod):** ``` z.object({ cid: z.number().int(), xrefs: z.record(z.string(), z.array(z.union([z.string(), z.number()]))).describe("An object where keys are the requested xrefTypes and values are arrays of found identifiers.") }) ``` ### 5. Technical Implementation & Architecture - **Project Initialization:** 1. Clone `mcp-ts-template` to a new `pubchem-mcp-server` directory. 2. Update `package.json` with the new project details. 3. Remove all example tools and resources from `src/mcp-server/`. - **API Client Service (`src/services/pubchem/pubchemApiClient.ts`):** - A dedicated, singleton service will encapsulate all interactions with the PubChem API. - **Responsibilities:** 1. **URL Construction:** Centralize all logic for building valid PUG REST API URLs. 2. **Rate Limiting:** Instantiate and manage a `RateLimiter` configured to **5 requests per 1 second**. Every outgoing API call must acquire a permit. 3. **Request Execution:** Use `fetchWithTimeout` for all API calls. 4. **Error Handling & Translation:** Intercept API responses and map PubChem HTTP status codes to `BaseErrorCode` enums, throwing structured `McpError` instances (e.g., 404 -> `NOT_FOUND`, 400 -> `INVALID_INPUT`). 5. **Response Validation:** Use Zod schemas to parse and validate the JSON response from PubChem before returning data to the tool logic. - **Tool Logic & Validation:** - Each tool's `logic.ts` file will contain its execution logic. - **Input Coherence Validation:** For tools with dynamic query types (e.g., `search_compounds_by_structure`), the logic must perform an internal consistency check. For example, if `queryType` is `'cid'`, the `query` must be a numeric string. If not, an `McpError` with code `INVALID_INPUT` should be thrown _before_ making an API call. ``` // Example check in logic.ts for search_compounds_by_structure if (input.queryType === 'cid' && isNaN(parseInt(input.query, 10))) { throw new McpError({ code: 'INVALID_INPUT', message: `Query type is 'cid' but the provided query '${input.query}' is not a valid number.` }); } ``` - **Configuration:** - The server will leverage existing environment variables from the template for transport, logging, and auth. No API key is required for PubChem. ### 6. Development Workflow 1. **Setup:** Clone the template, clean the project, and update `package.json`. 2. **API Client:** Implement the `pubchemApiClient.ts` service as the foundational layer. 3. **Implement Tools:** Implement tools one by one, following a "define, implement, register, test" cycle. - Define schemas and registration in `registration.ts`. - Write the `execute` logic (including input validation) in `logic.ts`. - Register the tool in `src/mcp-server/server.ts`. 4. **Test Iteratively:** Use the MCP inspector (`npm run inspector`) or an HTTP client to test each tool with valid, invalid, and edge-case inputs. 5. **Documentation:** Update the project `README.md`. **Crucially, this documentation must prominently feature an example of the inter-server workflow.** Showcasing how an agent can use `fetch_compound_xrefs` to get a `PubMedID` and then immediately use that ID with a `pubmed-mcp-server` is essential for demonstrating the power of the ecosystem. ### 7. Future Enhancements - **MCP Resources:** Introduce an MCP `Compound` resource to maintain context across tool calls, simplifying agent workflows. - **Asynchronous Search Support:** Implement support for PubChem's asynchronous `listkey` mechanism for long-running searches to prevent timeouts. - **Data Visualization:** Add a `generate_compound_chart` tool to create charts comparing properties for a list of CIDs.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cyanheads/pubchem-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

project-spec.md•15.2 KiB