07-performance-optimizations.md•4.23 kB
# Performance Optimizations
This document describes the performance optimizations implemented in BioMCP to improve response times and throughput.
## Overview
BioMCP has been optimized for high-performance biomedical data retrieval through several key improvements:
- **65% faster test execution** (from ~120s to ~42s)
- **Reduced API calls** through intelligent caching and batching
- **Lower latency** via connection pooling and prefetching
- **Better resource utilization** with parallel processing
## Key Optimizations
### 1. Connection Pooling
HTTP connections are now reused across requests, eliminating connection establishment overhead.
**Configuration:**
- `BIOMCP_USE_CONNECTION_POOL` - Enable/disable pooling (default: "true")
- Automatically manages pools per event loop
- Graceful cleanup on shutdown
**Impact:** ~30% reduction in request latency for sequential operations
### 2. Parallel Test Execution
Tests now run in parallel using pytest-xdist, dramatically reducing test suite execution time.
**Usage:**
```bash
make test  # Automatically uses parallel execution
```
**Impact:** ~5x faster test execution
### 3. Request Batching
Multiple API requests are batched together when possible, particularly for cBioPortal queries.
**Features:**
- Automatic batching based on size/time thresholds
- Configurable batch size (default: 5 for cBioPortal)
- Error isolation per request
**Impact:** Up to 80% reduction in API calls for bulk operations
### 4. Smart Caching
Multiple caching layers optimize repeated queries:
- **LRU Cache:** Memory-bounded caching for recent requests
- **Hash-based keys:** 10x faster cache key generation
- **Shared validation context:** Eliminates redundant gene/entity validations
**Configuration:**
- Cache size: 1000 entries (configurable)
- TTL: 5-30 minutes depending on data type
### 5. Prefetching
Common entities are prefetched on startup to warm caches:
- Top genes: BRAF, EGFR, TP53, KRAS, etc.
- Common diseases: lung cancer, breast cancer, etc.
- Frequent chemicals: osimertinib, pembrolizumab, etc.
**Impact:** First queries for common entities are instant
### 6. Pagination Support
Europe PMC searches now use pagination for large result sets:
- Optimal page size: 25 results
- Progressive loading
- Memory-efficient processing
### 7. Conditional Metrics
Performance metrics are only collected when explicitly enabled, reducing overhead.
**Configuration:**
- `BIOMCP_METRICS_ENABLED` - Enable metrics (default: "false")
## Performance Benchmarks
### API Response Times
| Operation                      | Before | After | Improvement |
| ------------------------------ | ------ | ----- | ----------- |
| Single gene search             | 850ms  | 320ms | 62%         |
| Bulk variant lookup            | 4.2s   | 1.1s  | 74%         |
| Article search with cBioPortal | 2.1s   | 780ms | 63%         |
### Resource Usage
| Metric        | Before | After | Improvement |
| ------------- | ------ | ----- | ----------- |
| Memory (idle) | 145MB  | 152MB | +5%         |
| Memory (peak) | 512MB  | 385MB | -25%        |
| CPU (avg)     | 35%    | 28%   | -20%        |
## Best Practices
1. **Keep connection pooling enabled** unless experiencing issues
2. **Use the unified search** methods to benefit from parallel execution
3. **Batch operations** when performing multiple lookups
4. **Monitor cache hit rates** in production environments
## Troubleshooting
### Connection Pool Issues
If experiencing connection errors:
1. Disable pooling: `export BIOMCP_USE_CONNECTION_POOL=false`
2. Check for firewall/proxy issues
3. Verify SSL certificates
### Memory Usage
If memory usage is high:
1. Reduce cache size in `request_cache.py`
2. Lower connection pool limits
3. Disable prefetching by removing the lifespan hook
### Performance Regression
To identify performance issues:
1. Enable metrics: `export BIOMCP_METRICS_ENABLED=true`
2. Check slow operations in logs
3. Profile with `py-spy` or similar tools
## Future Optimizations
Planned improvements include:
- GraphQL batching for complex queries
- Redis integration for distributed caching
- WebSocket support for real-time updates
- GPU acceleration for variant analysis