Skip to main content
Glama
pickleton89

cBioPortal MCP Server

by pickleton89
README.mdโ€ข18.7 kB
# ๐Ÿงฌ cBioPortal MCP Server [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![uv](https://img.shields.io/badge/uv-package%20manager-blue.svg)](https://github.com/astral-sh/uv) [![MCP](https://img.shields.io/badge/MCP-2.0+-green.svg)](https://github.com/model-context-protocol/mcp) [![FastMCP](https://img.shields.io/badge/FastMCP-framework-orange.svg)](https://github.com/jlowin/fastmcp) [![Tests](https://img.shields.io/badge/tests-93%20passing-brightgreen.svg)](#testing) [![Code Coverage](https://img.shields.io/badge/coverage-comprehensive-brightgreen.svg)](#development) A high-performance, production-ready **Model Context Protocol (MCP) server** that enables AI assistants to seamlessly interact with cancer genomics data from [cBioPortal](https://www.cbioportal.org/). Built with modern **async Python architecture**, **enterprise-grade modular design**, and **BaseEndpoint pattern** for maximum reliability, maintainability, and **4.5x faster performance**. ## ๐ŸŒŸ Overview & Key Features ### ๐Ÿš€ **Performance & Architecture** - **โšก 4.5x Performance Boost**: Full async implementation with concurrent API operations - **๐Ÿ—๏ธ Enterprise Architecture**: BaseEndpoint pattern with 60% code duplication elimination - **๐Ÿ“ Modular Design**: Professional structure with 71% code reduction (1,357 โ†’ 396 lines) - **๐Ÿ“ฆ Modern Package Management**: uv-based workflow with pyproject.toml - **๐Ÿ”„ Concurrent Operations**: Bulk fetching of studies and genes with automatic batching ### ๐Ÿ”ง **Enterprise Features** - **โš™๏ธ Multi-layer Configuration**: CLI args โ†’ Environment variables โ†’ YAML config โ†’ Defaults - **๐Ÿ“‹ Comprehensive Testing**: 93 tests across 8 organized test suites with full coverage - **๐Ÿ›ก๏ธ Input Validation**: Robust parameter validation and error handling - **๐Ÿ“Š Pagination Support**: Efficient data retrieval with automatic pagination - **๐Ÿ”ง Code Quality**: Ruff linting, formatting, and comprehensive code quality checks - **โšก Configurable Performance**: Adjustable batch sizes and performance tuning ### ๐Ÿงฌ **Cancer Genomics Capabilities** - **๐Ÿ” Study Management**: Browse, search, and analyze cancer studies - **๐Ÿงช Molecular Data**: Access mutations, clinical data, and molecular profiles - **๐Ÿ“ˆ Bulk Operations**: Concurrent fetching of multiple entities - **๐Ÿ”Ž Advanced Search**: Keyword-based discovery across studies and genes ## ๐ŸŽ† **Recent Quality & Architecture Improvements** ### ๐Ÿš€ **Major Refactoring Achievements (2025)** - **๐Ÿ—๏ธ BaseEndpoint Architecture**: Eliminated ~60% code duplication through inheritance-based design - **๐Ÿ“ Code Quality Excellence**: Comprehensive external review integration with modern linting (Ruff) - **โš™๏ธ Enhanced Configurability**: Gene batch sizes, retry logic, and performance tuning now configurable - **๐Ÿ›ก๏ธ Robust Validation**: Decorator-based parameter validation and error handling - **๐Ÿงช Testing Maturity**: 93 comprehensive tests with zero regressions through major refactoring ### ๐Ÿ“ˆ **Production-Ready Status** - **โœ… External Code Review**: Professional code quality validation and improvements implemented - **๐Ÿ”ง Modern Python Practices**: Type checking, linting, formatting, and best practice adherence - **๐Ÿ—๏ธ Enterprise Architecture**: Modular design with clear separation of concerns - **๐Ÿš€ Performance Optimized**: 4.5x async improvements with configurable batch processing ## ๐Ÿง ๐Ÿค– **AI-Collaborative Development** This project demonstrates **cutting-edge human-AI collaboration** in bioinformatics software development: - **๐Ÿง  Domain Expertise**: 20+ years cancer research experience guided architecture and feature requirements - **๐Ÿค– AI Implementation**: Advanced code generation, API design, and performance optimization through systematic LLM collaboration - **๐Ÿ”„ Quality Assurance**: Iterative refinement ensuring professional standards and production reliability - **๐Ÿ—๏ธ Architectural Evolution**: BaseEndpoint pattern and 60% code duplication elimination through AI-guided refactoring - **๐Ÿ“ˆ Innovation Approach**: Showcases how domain experts can effectively leverage AI tools to build enterprise-grade bioinformatics platforms **Recent Achievements**: External code review integration with comprehensive quality improvements including Ruff configuration, configurable performance settings, and modern Python best practices. **Methodology**: This collaborative approach combines deep biological domain knowledge with AI-powered development capabilities, accelerating innovation while maintaining rigorous code quality and scientific accuracy. ## ๐Ÿš€ Quick Start ### Prerequisites - **Python 3.10+** ๐Ÿ - **uv** (modern package manager) - recommended ๐Ÿ“ฆ - **Git** (optional, for cloning) ### โšก Installation & Launch ```bash # Install uv if needed pipx install uv # Clone and setup git clone https://github.com/yourusername/cbioportal-mcp.git cd cbioportal-mcp uv sync # Launch server uv run cbioportal-mcp ``` **That's it!** ๐ŸŽ‰ Your server is running and ready for AI assistant connections. ## ๐Ÿ“ฆ Installation Options ### ๐Ÿ”ฅ **Option 1: uv (Recommended)** Modern, lightning-fast package management with automatic environment handling: ```bash # Install uv pipx install uv # Or with Homebrew: brew install uv # Clone repository git clone https://github.com/yourusername/cbioportal-mcp.git cd cbioportal-mcp # One-command setup (creates venv + installs dependencies) uv sync ``` ### ๐Ÿ **Option 2: pip (Traditional)** Standard Python package management approach: ```bash # Create virtual environment python -m venv cbioportal-mcp-env # Activate environment # Windows: cbioportal-mcp-env\Scripts\activate # macOS/Linux: source cbioportal-mcp-env/bin/activate # Install dependencies pip install -e . ``` ## โš™๏ธ Configuration ### ๐ŸŽ›๏ธ **Multi-Layer Configuration System** The server supports flexible configuration with priority: **CLI args > Environment variables > Config file > Defaults** #### **YAML Configuration** ๐Ÿ“„ Create `config.yaml` for persistent settings: ```yaml # cBioPortal MCP Server Configuration server: base_url: "https://www.cbioportal.org/api" transport: "stdio" client_timeout: 480.0 logging: level: "INFO" format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s" api: rate_limit: enabled: false requests_per_second: 10 retry: enabled: true max_attempts: 3 backoff_factor: 1.0 cache: enabled: false ttl_seconds: 300 batch_size: genes: 100 # Configurable gene batch size for concurrent operations ``` #### **Environment Variables** ๐ŸŒ ```bash export CBIOPORTAL_BASE_URL="https://custom-instance.org/api" export CBIOPORTAL_LOG_LEVEL="DEBUG" export CBIOPORTAL_CLIENT_TIMEOUT=600 export CBIOPORTAL_GENE_BATCH_SIZE=50 # Configure gene batch size export CBIOPORTAL_RETRY_MAX_ATTEMPTS=5 ``` #### **CLI Options** ๐Ÿ’ป ```bash # Basic usage uv run cbioportal-mcp # Custom configuration uv run cbioportal-mcp --config config.yaml --log-level DEBUG # Custom API endpoint uv run cbioportal-mcp --base-url https://custom-instance.org/api # Generate example config uv run cbioportal-mcp --create-example-config ``` ## ๐Ÿ”Œ Usage & Integration ### ๐Ÿ–ฅ๏ธ **Claude Desktop Integration** Configure in your Claude Desktop MCP settings: **Option 1: Direct Script Path (Recommended)** ```json { "mcpServers": { "cbioportal": { "command": "/path/to/your/project/cbioportal_MCP/.venv/bin/cbioportal-mcp", "env": { "CBIOPORTAL_LOG_LEVEL": "INFO" } } } } ``` **Option 2: uv run (Alternative)** ```json { "mcpServers": { "cbioportal": { "command": "uv", "args": ["run", "cbioportal-mcp"], "cwd": "/path/to/your/project/cbioportal_MCP", "env": { "CBIOPORTAL_LOG_LEVEL": "INFO" } } } } ``` **Important Setup Steps:** 1. Replace `/path/to/your/project/cbioportal_MCP` with your actual project path 2. Ensure the project is installed in editable mode: `uv pip install -e .` 3. Restart Claude Desktop after updating the configuration ### ๐Ÿ”ง **VS Code Integration** Add to your workspace settings: ```json { "mcp.servers": { "cbioportal": { "command": "uv", "args": ["run", "cbioportal-mcp"], "cwd": "/path/to/cbioportal-mcp" } } } ``` ### ๐Ÿƒโ€โ™‚๏ธ **Command Line Usage** ```bash # Development server with debug logging uv run cbioportal-mcp --log-level DEBUG # Production server with custom config uv run cbioportal-mcp --config production.yaml # Using custom cBioPortal instance uv run cbioportal-mcp --base-url https://private-instance.org/api ``` ## ๐Ÿ—๏ธ Architecture ### ๐Ÿ“ **Modern Project Structure** ``` cbioportal-mcp/ โ”œโ”€โ”€ ๐Ÿ“ cbioportal_mcp/ # Main package directory โ”‚ โ”œโ”€โ”€ ๐Ÿ“Š server.py # Main MCP server implementation โ”‚ โ”œโ”€โ”€ ๐ŸŒ api_client.py # Dedicated HTTP client class โ”‚ โ”œโ”€โ”€ โš™๏ธ config.py # Multi-layer configuration system โ”‚ โ”œโ”€โ”€ ๐Ÿ“‹ constants.py # Centralized constants โ”‚ โ”œโ”€โ”€ ๐Ÿ“ endpoints/ # Domain-specific API modules โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ—๏ธ base.py # BaseEndpoint pattern (60% duplication reduction) โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ”ฌ studies.py # Cancer studies & search โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿงฌ genes.py # Gene operations & mutations โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿงช samples.py # Sample data management โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“ˆ molecular_profiles.py # Molecular & clinical data โ”‚ โ””โ”€โ”€ ๐Ÿ“ utils/ # Shared utilities โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ pagination.py # Efficient pagination logic โ”‚ โ”œโ”€โ”€ โœ… validation.py # Input validation โ”‚ โ””โ”€โ”€ ๐Ÿ“ logging.py # Logging configuration โ”œโ”€โ”€ ๐Ÿ“ tests/ # Comprehensive test suite (93 tests) โ”œโ”€โ”€ ๐Ÿ“ docs/ # Documentation โ”œโ”€โ”€ ๐Ÿ“ scripts/ # Development utilities โ””โ”€โ”€ ๐Ÿ“„ pyproject.toml # Modern Python project config ``` ### ๐ŸŽฏ **Design Principles** - **๐Ÿ”ง Modular**: Clear separation of concerns with domain-specific modules - **โšก Async-First**: Full asynchronous implementation for maximum performance - **๐Ÿ—๏ธ BaseEndpoint Pattern**: Inheritance-based architecture eliminating 60% code duplication - **๐Ÿ›ก๏ธ Robust**: Comprehensive input validation and error handling with decorators - **๐Ÿงช Testable**: 93 tests ensuring reliability and preventing regressions - **๐Ÿ”„ Maintainable**: Clean code architecture with 71% reduction in complexity - **๐Ÿ“ Code Quality**: Ruff linting, formatting, and modern Python practices ## ๐Ÿ› ๏ธ Available Tools The server provides **12 high-performance tools** for AI assistants: | ๐Ÿ”ง Tool | ๐Ÿ“ Description | โšก Features | |---------|---------------|------------| | `get_cancer_studies` | List all available cancer studies | ๐Ÿ“„ Pagination, ๐Ÿ” Filtering | | `search_studies` | Search studies by keyword | ๐Ÿ”Ž Full-text search, ๐Ÿ“Š Sorting | | `get_study_details` | Detailed study information | ๐Ÿ“ˆ Comprehensive metadata | | `get_samples_in_study` | Samples for specific studies | ๐Ÿ“„ Paginated results | | `get_genes` | Gene information by ID/symbol | ๐Ÿท๏ธ Flexible identifiers | | `search_genes` | Search genes by keyword | ๐Ÿ” Symbol & name search | | `get_mutations_in_gene` | Gene mutations in studies | ๐Ÿงฌ Mutation details | | `get_clinical_data` | Patient clinical information | ๐Ÿ‘ฅ Patient-centric data | | `get_molecular_profiles` | Study molecular profiles | ๐Ÿ“Š Profile metadata | | `get_multiple_studies` | **๐Ÿš€ Concurrent study fetching** | โšก Bulk operations | | `get_multiple_genes` | **๐Ÿš€ Concurrent gene retrieval** | ๐Ÿ“ฆ Automatic batching | | `get_gene_panels_for_study` | Gene panels in studies | ๐Ÿงฌ Panel information | ### ๐ŸŒŸ **Performance Features** - **โšก Concurrent Operations**: `get_multiple_*` methods use `asyncio.gather` for parallel processing - **๐Ÿ“ฆ Smart Batching**: Automatic batching for large gene lists - **๐Ÿ“„ Efficient Pagination**: Async generators for memory-efficient data streaming - **โฑ๏ธ Performance Metrics**: Execution timing and batch count reporting ## ๐Ÿš€ Performance ### ๐Ÿ“Š **Benchmark Results** Our async implementation delivers significant performance improvements: ``` ๐Ÿƒโ€โ™‚๏ธ Sequential Study Fetching: 1.31 seconds (10 studies) โšก Concurrent Study Fetching: 0.29 seconds (10 studies) ๐ŸŽฏ Performance Improvement: 4.57x faster! ``` ### ๐Ÿ”ฅ **Async Benefits** - **๐Ÿš€ 4.5x Faster**: Concurrent API requests vs sequential operations - **๐Ÿ“ฆ Bulk Processing**: Efficient batched operations for multiple entities - **โฑ๏ธ Non-blocking**: Asynchronous I/O prevents request blocking - **๐Ÿงฎ Smart Batching**: Automatic optimization for large datasets ### ๐Ÿ’ก **Performance Tips** - Use `get_multiple_studies` for fetching multiple studies concurrently - Leverage `get_multiple_genes` with automatic batching for gene lists - Configure `concurrent_batch_size` in config for optimal performance - Monitor execution metrics included in response metadata ## ๐Ÿ‘จโ€๐Ÿ’ป Development ### ๐Ÿ”จ **Development Workflow** ```bash # Setup development environment uv sync # Run tests uv run pytest # Run with coverage uv run pytest --cov=. # Run specific test file uv run pytest tests/test_server_lifecycle.py # Update snapshots uv run pytest --snapshot-update # Lint code uv run ruff check . # Format code uv run ruff format . ``` ### ๐Ÿงช **Testing** Comprehensive test suite with **93 tests** across 8 categories: - **๐Ÿ”„ `test_server_lifecycle.py`** - Server startup/shutdown & tool registration - **๐Ÿ“„ `test_pagination.py`** - Pagination logic & edge cases - **๐Ÿš€ `test_multiple_entity_apis.py`** - Concurrent operations & bulk fetching - **โœ… `test_input_validation.py`** - Parameter validation & error handling - **๐Ÿ“ธ `test_snapshot_responses.py`** - API response consistency (syrupy) - **๐Ÿ’ป `test_cli.py`** - Command-line interface & argument parsing - **๐Ÿ›ก๏ธ `test_error_handling.py`** - Error scenarios & network issues - **โš™๏ธ `test_configuration.py`** - Configuration system validation ### ๐Ÿ› ๏ธ **Development Tools & Quality Infrastructure** - **๐Ÿ“ฆ uv**: Modern package management (10-100x faster than pip) - **๐Ÿงช pytest**: Testing framework with async support and 93 comprehensive tests - **๐Ÿ“ธ syrupy**: Snapshot testing for API response consistency - **๐Ÿ” Ruff**: Lightning-fast linting, formatting, and code quality enforcement - **๐Ÿ“Š pytest-cov**: Code coverage reporting and quality metrics - **๐Ÿ—๏ธ BaseEndpoint**: Inheritance pattern eliminating 60% code duplication - **โš™๏ธ Type Checking**: Comprehensive type annotations for better code safety - **๐Ÿ›ก๏ธ Validation Decorators**: Automatic parameter validation and error handling ### ๐Ÿค **Contributing** 1. **๐Ÿด Fork** the repository 2. **๐ŸŒฟ Create** a feature branch (`git checkout -b feature/amazing-feature`) 3. **โœ… Test** your changes (`uv run pytest`) 4. **๐Ÿ“ Commit** with clear messages (`git commit -m 'Add amazing feature'`) 5. **๐Ÿš€ Push** to branch (`git push origin feature/amazing-feature`) 6. **๐Ÿ”„ Create** a Pull Request ## ๐Ÿ”ง Troubleshooting ### ๐Ÿšจ **Common Issues** #### **Server Fails to Start** ```bash # Check Python version python --version # Should be 3.10+ # Verify dependencies uv sync # Check for conflicts uv run python -c "import mcp, httpx, fastmcp; print('Dependencies OK')" ``` #### **Claude Desktop Connection Issues** - โœ… **Use direct script path** (Option 1) for most reliable connection - โœ… Verify paths in MCP configuration are absolute (no `~` or relative paths) - โœ… **Install in editable mode**: Run `uv pip install -e .` in project directory - โœ… Ensure the virtual environment `.venv/bin/cbioportal-mcp` script exists - โœ… For Option 2: Check that `uv` is in your system PATH and `cwd` points to project directory - โœ… Review Claude Desktop logs for detailed errors #### **Performance Issues** - ๐Ÿ”ง Increase `concurrent_batch_size` in config - ๐Ÿ”ง Adjust `max_concurrent_requests` for your system - ๐Ÿ”ง Use `get_multiple_*` methods for bulk operations - ๐Ÿ”ง Monitor network latency to cBioPortal API #### **Configuration Problems** ```bash # Generate example config uv run cbioportal-mcp --create-example-config # Validate configuration uv run cbioportal-mcp --config your-config.yaml --log-level DEBUG # Check environment variables env | grep CBIOPORTAL ``` ### ๐ŸŒ **API Connectivity** ```bash # Test cBioPortal API accessibility curl https://www.cbioportal.org/api/cancer-types # Test with custom instance curl https://your-instance.org/api/studies ``` ## ๐Ÿ’ก Examples & Use Cases ### ๐Ÿ” **Research Queries** ``` "What cancer studies are available for breast cancer research?" "Search for melanoma studies with genomic data" "Get mutation data for TP53 in lung cancer studies" "Find clinical data for patients in the TCGA-BRCA study" "What molecular profiles are available for pediatric brain tumors?" ``` ### ๐Ÿงฌ **Genomic Analysis** ``` "Compare mutation frequencies between two cancer studies" "Get all genes in the DNA repair pathway for ovarian cancer" "Find studies with both RNA-seq and mutation data" "What are the most frequently mutated genes in glioblastoma?" ``` ### ๐Ÿ“Š **Bulk Operations** ``` "Fetch data for multiple cancer studies concurrently" "Get information for a list of cancer genes efficiently" "Compare clinical characteristics across multiple studies" "Retrieve molecular profiles for several cancer types" ``` ## ๐Ÿ“œ License This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details. ## ๐Ÿ™ Acknowledgments - **๐Ÿงฌ [cBioPortal](https://www.cbioportal.org/)** - Open-access cancer genomics data platform - **๐Ÿ”— [Model Context Protocol](https://github.com/model-context-protocol/mcp)** - Enabling seamless AI-tool interactions - **โšก [FastMCP](https://github.com/jlowin/fastmcp)** - High-performance MCP server framework - **๐Ÿ“ฆ [uv](https://github.com/astral-sh/uv)** - Modern Python package management - **๐Ÿค– AI Collaboration** - Demonstrating the power of human-AI partnership in scientific software development --- **๐ŸŒŸ Production-ready bioinformatics platform built through innovative human-AI collaboration!** ๐Ÿงฌโœจ *Demonstrating the power of domain expertise + AI-assisted development for enterprise-grade scientific software.*

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pickleton89/cbioportal-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server