NCBI Entrez MCP Server

README.md•15.6 kB

# Entrez MCP Server - SDK Documentation This directory contains SDKs for calling the Entrez MCP Server via code execution, enabling LLMs and applications to interact with NCBI databases using familiar programming paradigms. ## Available SDKs ### JavaScript/TypeScript SDK **Location**: `javascript/entrez-sdk.js` (with type definitions in `entrez-sdk.d.ts`) **Features**: - Full TypeScript support with comprehensive type definitions - Works in Node.js and modern browsers - Promise-based async/await API - Automatic session management - Helper classes for data staging **Installation**: ```javascript import { EntrezSDK } from './sdk/javascript/entrez-sdk.js'; ``` ### Python SDK **Location**: `python/entrez_sdk.py` **Features**: - Full async/await support with asyncio - Type hints for IDE autocomplete - Async context manager for resource cleanup - Custom exception classes - Flexible ID handling (lists or strings) **Installation**: ```bash cd sdk/python pip install -r requirements.txt ``` ```python from entrez_sdk import EntrezSDK ``` ## Why Use SDKs? ### Problem: Direct Tool Calling Issues When LLMs use code execution to call MCP tools directly, they encounter several issues: 1. **Identifier Syntax**: Hyphens in tool names (`entrez-query`) cause `SyntaxError` in JavaScript/Python 2. **Parameter Format**: MCP expects individual parameters, code expects dictionaries/objects 3. **Error Handling**: MCP errors vs exceptions have different handling patterns 4. **Session State**: Manual session ID tracking is error-prone 5. **Data Staging**: `data_access_id` values must be preserved across calls ### Solution: Language-Native SDKs The SDKs solve these issues by: - ✅ Using underscore naming (`entrez_query`) for valid identifiers - ✅ Accepting dictionaries/objects for parameters - ✅ Throwing proper exceptions with context - ✅ Automatically tracking session IDs - ✅ Providing `DataStaging` helper classes to manage IDs ## Quick Start Examples ### JavaScript Example ```javascript import { EntrezSDK } from './sdk/javascript/entrez-sdk.js'; async function searchAndAnalyze() { const sdk = new EntrezSDK('http://localhost:8787'); try { // Search for articles const searchResults = await sdk.search( 'pubmed', 'CRISPR gene editing therapy', { retmax: 20 } ); console.log(`Found ${searchResults.total_results} articles`); // Fetch and stage data for analysis const staging = await sdk.fetchAndStage( 'pubmed', searchResults.idlist.slice(0, 10) ); // Get schema const schema = await staging.getSchema(); console.log(`Tables: ${schema.table_names.join(', ')}`); // Run SQL queries const recentArticles = await staging.query(` SELECT pmid, title, year, journal FROM article WHERE year >= 2023 ORDER BY year DESC `); console.log(`Recent articles: ${recentArticles.row_count}`); recentArticles.results.forEach(article => { console.log(`- [${article.year}] ${article.title}`); }); // Get articles by MeSH term const meshAnalysis = await staging.query(` SELECT m.descriptorname, COUNT(*) as article_count FROM meshterm m JOIN article_meshterm am ON m.uid = am.meshterm_uid GROUP BY m.descriptorname ORDER BY article_count DESC LIMIT 10 `); console.log('\nTop MeSH terms:'); meshAnalysis.results.forEach(term => { console.log(`- ${term.descriptorname}: ${term.article_count} articles`); }); } catch (error) { console.error(`Error: ${error.message}`); } } searchAndAnalyze(); ``` ### Python Example ```python import asyncio from entrez_sdk import EntrezSDK async def search_and_analyze(): async with EntrezSDK('http://localhost:8787') as sdk: try: # Search for articles search_results = await sdk.search( 'pubmed', 'CRISPR gene editing therapy', retmax=20 ) print(f"Found {search_results['total_results']} articles") # Fetch and stage data for analysis staging = await sdk.fetch_and_stage( 'pubmed', search_results['idlist'][:10] ) # Get schema schema = await staging.get_schema() print(f"Tables: {', '.join(schema['table_names'])}") # Run SQL queries recent_articles = await staging.query(""" SELECT pmid, title, year, journal FROM article WHERE year >= 2023 ORDER BY year DESC """) print(f"Recent articles: {recent_articles['row_count']}") for article in recent_articles['results']: print(f"- [{article['year']}] {article['title']}") # Get articles by MeSH term mesh_analysis = await staging.query(""" SELECT m.descriptorname, COUNT(*) as article_count FROM meshterm m JOIN article_meshterm am ON m.uid = am.meshterm_uid GROUP BY m.descriptorname ORDER BY article_count DESC LIMIT 10 """) print('\nTop MeSH terms:') for term in mesh_analysis['results']: print(f"- {term['descriptorname']}: {term['article_count']} articles") except Exception as e: print(f"Error: {e}") # Run the async function asyncio.run(search_and_analyze()) ``` ## API Reference ### Core Methods #### System Tools - **`getApiKeyStatus()`** / **`get_api_key_status()`** - Check NCBI API key status and rate limits - Returns: API key info and rate limit details - **`getCapabilities(options)`** / **`get_capabilities(format, tool, include_metadata)`** - Get available tools and their capabilities - Returns: Tool metadata and operation details - **`getToolInfo(toolName, format)`** / **`get_tool_info(tool_name, format)`** - Get detailed information about a specific tool - Returns: Tool capabilities, operations, and parameters #### Entrez Query Tools - **`search(database, term, options)`** - Search an NCBI database - Parameters: - `database`: Database name (e.g., 'pubmed', 'protein') - `term`: Search query - `options`: Optional { retmax, retstart, sort, field, intendedUse } - Returns: Search results with IDs and metadata - **`summary(database, ids, options)`** - Get document summaries for specific IDs - Parameters: - `database`: Database name - `ids`: Single ID, array/list of IDs, or comma-separated string - `options`: Optional { retmax, compactMode, detailLevel, maxTokens } - Returns: Summary data - **`fetch(database, ids, options)`** - Fetch detailed records - Parameters: - `database`: Database name - `ids`: ID(s) to fetch - `options`: Optional { rettype, intendedUse, detailLevel } - Returns: Full record data - **`info(database)`** - Get database information and available fields - Returns: Database metadata - **`link(database, ids, options)`** - Find links between databases - Parameters: - `database`: Target database - `ids`: Source ID(s) - `options`: Optional { dbfrom, linkname } - Returns: Link data - **`post(database, ids, options)`** / **`post(database, ids, usehistory)`** - Post IDs to Entrez history server - Returns: WebEnv and QueryKey for batch operations - **`globalQuery(term)`** / **`global_query(term)`** - Search across all NCBI databases - Returns: Cross-database results - **`spell(term, database)`** - Get spelling suggestions - Returns: Corrected query suggestions #### Data Staging Tools - **`fetchAndStage(database, ids, options)`** / **`fetch_and_stage(database, ids, ...)`** - Fetch records and stage them in SQL database - Returns: `DataStaging` object with query methods - The `DataStaging` object provides: - `query(sql, options)`: Execute SQL queries - `getSmartSummary(options)` / `get_smart_summary(options)`: Get AI-generated summary - `getSchema()` / `get_schema()`: Get database schema - `dataAccessId` / `data_access_id`: Access ID for later queries - **`queryStagedData(dataAccessId, sql, options)`** / **`query_staged_data(data_access_id, sql, ...)`** - Query previously staged data - Returns: Query results - **`getSmartSummary(dataAccessId, options)`** / **`get_smart_summary(data_access_id, ...)`** - Get intelligent summary of staged data - Returns: AI-generated insights - **`getSchema(dataAccessId)`** / **`get_schema(data_access_id)`** - Get schema for staged data - Returns: Table definitions, column descriptions, recommended queries - **`listDatasets()`** / **`list_datasets()`** - List all active staged datasets - Returns: Dataset metadata #### External API Tools - **`getCompound(identifier, identifierType, outputFormat)`** / **`get_compound(...)`** - Get PubChem compound data - Parameters: - `identifier`: Compound identifier (name, CID, etc.) - `identifierType`: 'name', 'cid', 'inchi', 'smiles', etc. - `outputFormat`: 'json' or 'xml' - Returns: Compound data - **`getSubstance(identifier, identifierType, outputFormat)`** / **`get_substance(...)`** - Get PubChem substance data - Returns: Substance data - **`getBioassay(identifier, identifierType, outputFormat)`** / **`get_bioassay(...)`** - Get PubChem bioassay data - Returns: Bioassay data - **`structureSearch(structure, structureType, searchType, options)`** / **`structure_search(...)`** - Search PubChem by chemical structure - Parameters: - `structure`: Structure specification - `structureType`: 'smiles', 'inchi', 'sdf', 'mol' - `searchType`: 'identity', 'substructure', 'superstructure', 'similarity' - `options`: Optional { threshold, maxRecords } - Returns: Matching compounds - **`convertPmcIds(ids, options)`** / **`convert_pmc_ids(ids, ...)`** - Convert between PMC, PMID, and DOI identifiers - Returns: ID conversion results - **`getPmcArticle(id, outputFormat)`** / **`get_pmc_article(id, ...)`** - Get PMC Open Access full-text article - Returns: Article content - **`exportCitations(ids, citationFormat)`** / **`export_citations(ids, ...)`** - Export citations in various formats - Parameters: - `ids`: Article ID(s) - `citationFormat`: 'ris', 'nbib', 'medline', 'bibtex' - Returns: Formatted citations ## Common Workflows ### Literature Review Workflow ```javascript // 1. Search for relevant articles const searchResults = await sdk.search('pubmed', 'machine learning healthcare', { retmax: 50 }); // 2. Stage data for analysis const staging = await sdk.fetchAndStage('pubmed', searchResults.idlist); // 3. Analyze by year const yearlyTrend = await staging.query(` SELECT year, COUNT(*) as count FROM article WHERE year IS NOT NULL GROUP BY year ORDER BY year DESC `); // 4. Find top journals const topJournals = await staging.query(` SELECT journal, COUNT(*) as article_count FROM article WHERE journal IS NOT NULL GROUP BY journal ORDER BY article_count DESC LIMIT 10 `); // 5. Analyze MeSH terms const meshTerms = await staging.query(` SELECT m.descriptorname, COUNT(DISTINCT a.pmid) as article_count FROM meshterm m JOIN article_meshterm am ON m.uid = am.meshterm_uid JOIN article a ON am.article_uid = a.uid GROUP BY m.descriptorname ORDER BY article_count DESC LIMIT 20 `); ``` ### Chemical Structure Analysis ```python # 1. Look up a compound compound = await sdk.get_compound('aspirin', 'name') cid = compound['PC_Compounds'][0]['id']['id']['cid'] # 2. Find similar compounds similar = await sdk.structure_search( 'CC(=O)Oc1ccccc1C(=O)O', # Aspirin SMILES 'smiles', 'similarity', threshold=90, max_records=10 ) # 3. Get bioassay data bioassays = await sdk.get_bioassay(str(cid), 'cid') ``` ### Cross-Database Integration ```javascript // 1. Search PubMed for gene const geneArticles = await sdk.search('pubmed', 'BRCA1[Gene]', { retmax: 5 }); // 2. Find related gene records const geneLinks = await sdk.link('gene', geneArticles.idlist[0], { dbfrom: 'pubmed' }); // 3. Get protein sequences const proteinInfo = await sdk.fetch('protein', geneLinks.linksets[0].ids, { rettype: 'fasta' }); // 4. Get compound interactions const compounds = await sdk.search('pccompound', 'BRCA1', { retmax: 10 }); ``` ## Testing Both SDKs include comprehensive integration tests. ### Run JavaScript Tests ```bash npm run test-sdk-js ``` ### Run Python Tests ```bash npm run test-sdk-py ``` ### Run All SDK Tests ```bash npm run test-sdk-all ``` ## Error Handling ### JavaScript ```javascript try { const results = await sdk.search('invalid_db', 'test'); } catch (error) { console.error(`Error: ${error.message}`); // Handle specific error types if (error.message.includes('Invalid database')) { // Suggest valid databases } } ``` ### Python ```python from entrez_sdk import EntrezSDKError try: results = await sdk.search('invalid_db', 'test') except EntrezSDKError as e: print(f"Error: {e}") # Handle specific error types if 'Invalid database' in str(e): # Suggest valid databases ``` ## Best Practices 1. **Use Async Context Managers** (Python) ```python async with EntrezSDK(base_url) as sdk: # SDK automatically cleans up resources ``` 2. **Preserve Data Access IDs** ```javascript const staging = await sdk.fetchAndStage('pubmed', ids); // Use staging.dataAccessId in subsequent calls const results = await staging.query('SELECT * FROM article'); ``` 3. **Handle Rate Limits** ```python # Check API key status first status = await sdk.get_api_key_status() # Adjust request frequency based on rate limits ``` 4. **Use Type Safety** ```typescript // TypeScript provides autocomplete and type checking const results: SearchResult = await sdk.search('pubmed', 'test'); ``` 5. **Batch Operations** ```javascript // Use fetchAndStage for analyzing multiple articles const staging = await sdk.fetchAndStage('pubmed', idList); // Then run multiple SQL queries efficiently ``` ## Troubleshooting ### Issue: `SyntaxError: Unexpected token '-'` **Solution**: Use underscore versions (`entrez_query`, not `entrez-query`) ### Issue: Lost `data_access_id` **Solution**: Use `DataStaging` helper class or store the ID: ```javascript const staging = await sdk.fetchAndStage('pubmed', ids); // staging.dataAccessId is preserved await staging.query('SELECT ...'); ``` ### Issue: Connection refused **Solution**: Ensure MCP server is running: ```bash npm run dev # Start server on http://localhost:8787 ``` ### Issue: Missing await **Solution**: All SDK methods are async: ```javascript // ❌ Wrong const results = sdk.search('pubmed', 'test'); // ✅ Correct const results = await sdk.search('pubmed', 'test'); ``` ## Contributing To add new features to the SDKs: 1. Update `javascript/entrez-sdk.js` and `javascript/entrez-sdk.d.ts` 2. Update `python/entrez_sdk.py` 3. Add tests to `test-sdk-javascript.js` and `test-sdk-python.py` 4. Update this README with examples 5. Run `npm run test-sdk-all` to verify ## License MIT License - see main project LICENSE file

Latest Blog Posts

What Is Context Bloat in MCP?
By Om-Shree-0709 on December 16, 2025.
mcp
Context Bloat
MCP Moves to the Linux Foundation: Neutral Stewardship for Agentic Infrastructure
By Om-Shree-0709 on December 15, 2025.
mcp
anthropic
Linux Foundation
Code Execution with MCP: Architecting Agentic Efficiency
By Om-Shree-0709 on December 14, 2025.
mcp
Token bloat

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/QuentinCody/entrez-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server