๐งฌ cBioPortal MCP Server
A high-performance, production-ready Model Context Protocol (MCP) server that enables AI assistants to seamlessly interact with cancer genomics data from cBioPortal. Built with modern async Python architecture, enterprise-grade modular design, and BaseEndpoint pattern for maximum reliability, maintainability, and 4.5x faster performance.
๐ Overview & Key Features
๐ Performance & Architecture
โก 4.5x Performance Boost: Full async implementation with concurrent API operations
๐๏ธ Enterprise Architecture: BaseEndpoint pattern with 60% code duplication elimination
๐ Modular Design: Professional structure with 71% code reduction (1,357 โ 396 lines)
๐ฆ Modern Package Management: uv-based workflow with pyproject.toml
๐ Concurrent Operations: Bulk fetching of studies and genes with automatic batching
๐ง Enterprise Features
โ๏ธ Multi-layer Configuration: CLI args โ Environment variables โ YAML config โ Defaults
๐ Comprehensive Testing: 93 tests across 8 organized test suites with full coverage
๐ก๏ธ Input Validation: Robust parameter validation and error handling
๐ Pagination Support: Efficient data retrieval with automatic pagination
๐ง Code Quality: Ruff linting, formatting, and comprehensive code quality checks
โก Configurable Performance: Adjustable batch sizes and performance tuning
๐งฌ Cancer Genomics Capabilities
๐ Study Management: Browse, search, and analyze cancer studies
๐งช Molecular Data: Access mutations, clinical data, and molecular profiles
๐ Bulk Operations: Concurrent fetching of multiple entities
๐ Advanced Search: Keyword-based discovery across studies and genes
Related MCP server: browser-mcp
๐ Recent Quality & Architecture Improvements
๐ Major Refactoring Achievements (2025)
๐๏ธ BaseEndpoint Architecture: Eliminated ~60% code duplication through inheritance-based design
๐ Code Quality Excellence: Comprehensive external review integration with modern linting (Ruff)
โ๏ธ Enhanced Configurability: Gene batch sizes, retry logic, and performance tuning now configurable
๐ก๏ธ Robust Validation: Decorator-based parameter validation and error handling
๐งช Testing Maturity: 93 comprehensive tests with zero regressions through major refactoring
๐ Production-Ready Status
โ External Code Review: Professional code quality validation and improvements implemented
๐ง Modern Python Practices: Type checking, linting, formatting, and best practice adherence
๐๏ธ Enterprise Architecture: Modular design with clear separation of concerns
๐ Performance Optimized: 4.5x async improvements with configurable batch processing
๐ง ๐ค AI-Collaborative Development
This project demonstrates cutting-edge human-AI collaboration in bioinformatics software development:
๐ง Domain Expertise: 20+ years cancer research experience guided architecture and feature requirements
๐ค AI Implementation: Advanced code generation, API design, and performance optimization through systematic LLM collaboration
๐ Quality Assurance: Iterative refinement ensuring professional standards and production reliability
๐๏ธ Architectural Evolution: BaseEndpoint pattern and 60% code duplication elimination through AI-guided refactoring
๐ Innovation Approach: Showcases how domain experts can effectively leverage AI tools to build enterprise-grade bioinformatics platforms
Recent Achievements: External code review integration with comprehensive quality improvements including Ruff configuration, configurable performance settings, and modern Python best practices.
Methodology: This collaborative approach combines deep biological domain knowledge with AI-powered development capabilities, accelerating innovation while maintaining rigorous code quality and scientific accuracy.
๐ Quick Start
Prerequisites
Python 3.10+ ๐
uv (modern package manager) - recommended ๐ฆ
Git (optional, for cloning)
โก Installation & Launch
That's it! ๐ Your server is running and ready for AI assistant connections.
๐ฆ Installation Options
๐ฅ Option 1: uv (Recommended)
Modern, lightning-fast package management with automatic environment handling:
๐ Option 2: pip (Traditional)
Standard Python package management approach:
โ๏ธ Configuration
๐๏ธ Multi-Layer Configuration System
The server supports flexible configuration with priority: CLI args > Environment variables > Config file > Defaults
YAML Configuration ๐
Create config.yaml for persistent settings:
Environment Variables ๐
CLI Options ๐ป
๐ Usage & Integration
๐ฅ๏ธ Claude Desktop Integration
Configure in your Claude Desktop MCP settings:
Option 1: Direct Script Path (Recommended)
Option 2: uv run (Alternative)
Important Setup Steps:
Replace
/path/to/your/project/cbioportal_MCPwith your actual project pathEnsure the project is installed in editable mode:
uv pip install -e .Restart Claude Desktop after updating the configuration
๐ง VS Code Integration
Add to your workspace settings:
๐โโ๏ธ Command Line Usage
๐๏ธ Architecture
๐ Modern Project Structure
๐ฏ Design Principles
๐ง Modular: Clear separation of concerns with domain-specific modules
โก Async-First: Full asynchronous implementation for maximum performance
๐๏ธ BaseEndpoint Pattern: Inheritance-based architecture eliminating 60% code duplication
๐ก๏ธ Robust: Comprehensive input validation and error handling with decorators
๐งช Testable: 93 tests ensuring reliability and preventing regressions
๐ Maintainable: Clean code architecture with 71% reduction in complexity
๐ Code Quality: Ruff linting, formatting, and modern Python practices
๐ ๏ธ Available Tools
The server provides 12 high-performance tools for AI assistants:
๐ง Tool | ๐ Description | โก Features |
| List all available cancer studies | ๐ Pagination, ๐ Filtering |
| Search studies by keyword | ๐ Full-text search, ๐ Sorting |
| Detailed study information | ๐ Comprehensive metadata |
| Samples for specific studies | ๐ Paginated results |
| Gene information by ID/symbol | ๐ท๏ธ Flexible identifiers |
| Search genes by keyword | ๐ Symbol & name search |
| Gene mutations in studies | ๐งฌ Mutation details |
| Patient clinical information | ๐ฅ Patient-centric data |
| Study molecular profiles | ๐ Profile metadata |
| ๐ Concurrent study fetching | โก Bulk operations |
| ๐ Concurrent gene retrieval | ๐ฆ Automatic batching |
| Gene panels in studies | ๐งฌ Panel information |
๐ Performance Features
โก Concurrent Operations:
get_multiple_*methods useasyncio.gatherfor parallel processing๐ฆ Smart Batching: Automatic batching for large gene lists
๐ Efficient Pagination: Async generators for memory-efficient data streaming
โฑ๏ธ Performance Metrics: Execution timing and batch count reporting
๐ Performance
๐ Benchmark Results
Our async implementation delivers significant performance improvements:
๐ฅ Async Benefits
๐ 4.5x Faster: Concurrent API requests vs sequential operations
๐ฆ Bulk Processing: Efficient batched operations for multiple entities
โฑ๏ธ Non-blocking: Asynchronous I/O prevents request blocking
๐งฎ Smart Batching: Automatic optimization for large datasets
๐ก Performance Tips
Use
get_multiple_studiesfor fetching multiple studies concurrentlyLeverage
get_multiple_geneswith automatic batching for gene listsConfigure
concurrent_batch_sizein config for optimal performanceMonitor execution metrics included in response metadata
๐จโ๐ป Development
๐จ Development Workflow
๐งช Testing
Comprehensive test suite with 93 tests across 8 categories:
๐ - Server startup/shutdown & tool registration
๐ - Pagination logic & edge cases
๐ - Concurrent operations & bulk fetching
โ - Parameter validation & error handling
๐ธ - API response consistency (syrupy)
๐ป - Command-line interface & argument parsing
๐ก๏ธ - Error scenarios & network issues
โ๏ธ - Configuration system validation
๐ ๏ธ Development Tools & Quality Infrastructure
๐ฆ uv: Modern package management (10-100x faster than pip)
๐งช pytest: Testing framework with async support and 93 comprehensive tests
๐ธ syrupy: Snapshot testing for API response consistency
๐ Ruff: Lightning-fast linting, formatting, and code quality enforcement
๐ pytest-cov: Code coverage reporting and quality metrics
๐๏ธ BaseEndpoint: Inheritance pattern eliminating 60% code duplication
โ๏ธ Type Checking: Comprehensive type annotations for better code safety
๐ก๏ธ Validation Decorators: Automatic parameter validation and error handling
๐ค Contributing
๐ด Fork the repository
๐ฟ Create a feature branch (
git checkout -b feature/amazing-feature)โ Test your changes (
uv run pytest)๐ Commit with clear messages (
git commit -m 'Add amazing feature')๐ Push to branch (
git push origin feature/amazing-feature)๐ Create a Pull Request
๐ง Troubleshooting
๐จ Common Issues
Server Fails to Start
Claude Desktop Connection Issues
โ Use direct script path (Option 1) for most reliable connection
โ Verify paths in MCP configuration are absolute (no
~or relative paths)โ Install in editable mode: Run
uv pip install -e .in project directoryโ Ensure the virtual environment
.venv/bin/cbioportal-mcpscript existsโ For Option 2: Check that
uvis in your system PATH andcwdpoints to project directoryโ Review Claude Desktop logs for detailed errors
Performance Issues
๐ง Increase
concurrent_batch_sizein config๐ง Adjust
max_concurrent_requestsfor your system๐ง Use
get_multiple_*methods for bulk operations๐ง Monitor network latency to cBioPortal API
Configuration Problems
๐ API Connectivity
๐ก Examples & Use Cases
๐ Research Queries
๐งฌ Genomic Analysis
๐ Bulk Operations
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
๐งฌ - Open-access cancer genomics data platform
๐ - Enabling seamless AI-tool interactions
โก - High-performance MCP server framework
๐ฆ - Modern Python package management
๐ค AI Collaboration - Demonstrating the power of human-AI partnership in scientific software development
๐ Production-ready bioinformatics platform built through innovative human-AI collaboration! ๐งฌโจ
Demonstrating the power of domain expertise + AI-assisted development for enterprise-grade scientific software.