README_EN.md•6.64 kB
# PubChem Chemical Safety MCP Server
A **Model Context Protocol (MCP)** based chemical safety information server that automatically retrieves toxicology, GHS safety classification, and chemical properties from compound names or CIDs.
## Features
- Retrieve compound basic property information (molecular formula, molecular weight, IUPAC name, etc.)
- Get GHS safety classification information (signal words, pictograms, hazard statements)
- Access toxicity experimental data (LD50, LC50, etc.)
- Support batch queries and caching mechanism
- Based on MCP protocol, integrates with Claude Desktop and other AI clients
- Support proxy access to resolve network connectivity issues
## Tech Stack
- **Protocol**: Model Context Protocol (MCP)
- **Language**: Python 3.10+
- **Dependency Management**: uv
- **Data Source**: PubChem REST API
- **Caching**: Local file caching
- **HTTP Client**: aiohttp (supports proxy and retry mechanism)
## Installation and Running
### 1. Install Dependencies
```bash
uv sync
```
### 2. Set Up Proxy (Required)
Due to network access restrictions, proxy setup is required:
```bash
export https_proxy=http://127.0.0.1:10808
export http_proxy=http://127.0.0.1:10808
```
### 3. Run MCP Server
```bash
uv run python -m pubchem_mcp.mcp_server
```
### 4. Test Server
```bash
uv run verify_mcp.py
```
## MCP Tools
The server provides the following 3 MCP tools:
### 1. `get_compound_info`
Get compound basic information
- **Parameters**: `name` (compound name)
- **Returns**: CID, molecular formula, molecular weight, IUPAC name, SMILES, etc.
- **Example**: `get_compound_info("aspirin")` or `get_compound_info("阿司匹林")`
### 2. `get_safety_info`
Get GHS safety classification information
- **Parameters**: `cid` (PubChem compound ID)
- **Returns**: Signal words, GHS pictograms, hazard statements, precautionary statements, etc.
- **Example**: `get_safety_info(2244)` (aspirin's CID)
### 3. `get_toxicity_data`
Get toxicity experimental data
- **Parameters**: `cid` (PubChem compound ID)
- **Returns**: Acute toxicity, ecotoxicity, carcinogenicity, reproductive toxicity, and other detailed data
- **Example**: `get_toxicity_data(2244)` (aspirin's CID)
## Claude Desktop Integration
1. Open Claude Desktop settings
2. Click "Developer" → "Edit Config"
3. Add the following configuration:
```json
{
"mcpServers": {
"pubchem-chemical-safety": {
"command": "uv",
"args": [
"--directory",
"/Users/liueic/Documents/Code/PubChem-MCP-Server",
"run",
"python",
"-m",
"pubchem_mcp.mcp_server"
],
"env": {
"https_proxy": "http://127.0.0.1:10808",
"http_proxy": "http://127.0.0.1:10808"
}
}
}
}
```
4. Restart Claude Desktop
## Usage Examples
### Python Client Example
```python
import asyncio
from mcp.client.stdio import stdio_client
from mcp import ClientSession, StdioServerParameters
async def main():
server_params = StdioServerParameters(
command='uv',
args=['--directory', '/path/to/project', 'run', 'python', '-m', 'pubchem_mcp.mcp_server'],
env={
'https_proxy': 'http://127.0.0.1:10808',
'http_proxy': 'http://127.0.0.1:10808'
}
)
async with stdio_client(server_params) as (stdio, write):
async with ClientSession(stdio, write) as session:
await session.initialize()
# Get compound information
result = await session.call_tool('get_compound_info', {
'name': 'aspirin'
})
print(result.content[0].text)
asyncio.run(main())
```
### Using in Claude Desktop
You can directly use natural language queries in Claude Desktop:
```
Please query the safety information of aspirin
```
```
Get the toxicity data of caffeine
```
## Debugging Tools
Use MCP Inspector for debugging:
```bash
npx -y @modelcontextprotocol/inspector uv run python -m pubchem_mcp.mcp_server
```
## Project Structure
```
pubchem_mcp/
├── mcp_server.py # Main MCP server file
├── services/
│ ├── pubchem_client.py # PubChem API client (supports proxy and retry)
│ ├── cache_service.py # Caching service
│ └── pubchem_service.py # PubChem service layer
├── models/
│ └── schemas.py # Data model definitions
└── api/
└── routes.py # API routes (optional)
tests/ # Test files
manage_cache.py # Cache management tool
verify_mcp.py # MCP verification tool
claude_desktop_config.json # Claude Desktop configuration example
```
## Configuration Options
- `CACHE_DIR`: Cache directory path (default `.cache`)
- `PUBCHEM_RATE_LIMIT`: API request limit (default 5 req/s)
- `https_proxy`: HTTPS proxy settings
- `http_proxy`: HTTP proxy settings
## Cache Management
The server uses local file caching to improve performance:
- **Cache Location**: `.cache/` directory
- **Cache Strategy**:
- Compound information cached for 2 hours
- Safety information and toxicity data cached for 1 hour
- Expired files automatically deleted
- **Cache Management**: Run `uv run manage_cache.py` to view cache statistics
## Network Configuration
### Proxy Setup
The project supports proxy configuration through environment variables:
```bash
export https_proxy=http://127.0.0.1:10808
export http_proxy=http://127.0.0.1:10808
```
### Retry Mechanism
- Automatic retry for 503 errors (server busy)
- Incremental wait time to avoid overly frequent requests
- Maximum 3 retries
### Request Header Optimization
- Uses complete browser User-Agent
- Adds necessary HTTP headers (Accept, Referer, etc.)
- Supports gzip compression
## Tested Compounds
The following compounds have been tested and are available:
- aspirin (阿司匹林) - CID: 2244
- caffeine (咖啡因) - CID: 2519
- water (水) - CID: 962
- ethanol (乙醇) - CID: 702
- benzene (苯) - CID: 241
## Troubleshooting
### Common Issues
1. **503 Error**: Server busy, will automatically retry
2. **Network Connection Failed**: Check if proxy settings are correct
3. **Compound Not Found**: Try using English name or chemical formula
### Log Viewing
The server outputs detailed log information, including:
- Request status
- Retry information
- Error details
## Development
### Running Tests
```bash
uv run pytest tests/
```
### Code Formatting
```bash
uv run black .
uv run isort .
```
### Type Checking
```bash
uv run mypy pubchem_mcp/
```
## License
MIT License
## Contributing
Issues and Pull Requests are welcome!