ChatSpatial

CONTRIBUTING.md•26.4 KiB

# Contributing to ChatSpatial Thank you for your interest in contributing to ChatSpatial! This document provides guidelines for contributing to the project. ## Getting Started ### Prerequisites - Python 3.10 or higher (required for MCP support) - Git - Basic understanding of spatial transcriptomics - Familiarity with Model Context Protocol (MCP) is helpful - For visualization testing: Understanding of matplotlib and image processing - For advanced methods: Knowledge of PyTorch, R (for R-based methods) ### Development Setup 1. **Fork and clone the repository** ```bash git clone https://github.com/cafferychen777/ChatSpatial.git cd ChatSpatial ``` 2. **Create a virtual environment** ```bash # Create virtual environment for development python3 -m venv chatspatial_dev source chatspatial_dev/bin/activate # macOS/Linux # chatspatial_dev\Scripts\activate # Windows ``` 3. **Install development dependencies** ```bash # Install ChatSpatial with development dependencies pip install -e ".[dev]" # Install development tools pip install pytest pytest-cov black isort mypy flake8 # Install MCP development tools pip install mcp[dev] # Optional: Install R dependencies for R-based methods pip install rpy2 # Requires R to be installed separately ``` 4. **Run tests to ensure everything works** ```bash # Run basic test suite pytest tests/ # Run with coverage reporting pytest tests/ --cov=chatspatial --cov-report=html # Run specific test categories pytest tests/unit/ # Unit tests pytest tests/integration/ # Integration tests pytest tests/e2e/ # End-to-end MCP tests # Test MCP server directly python -m chatspatial --help ``` ## Project Architecture ### Core Components ChatSpatial follows a modular MCP server architecture: ``` chatspatial/ ├── server.py # Main MCP server with tool definitions ├── spatial_mcp_adapter.py # Spatial data adapter for MCP ├── tools/ # Analysis tool implementations │ ├── annotation.py # Cell type annotation methods │ ├── cell_communication.py # LIANA+, CellPhoneDB, CellChat │ ├── spatial_genes.py # GASTON, SpatialDE, SPARK-X │ ├── spatial_domains.py # SpaGCN, STAGATE, Leiden/Louvain │ ├── visualization.py # All plot types and image generation │ └── ... ├── models/ # Pydantic data models │ ├── data.py # Parameter models for each tool │ └── analysis.py # Result models for analysis outputs ├── utils/ # Utility functions │ ├── error_handling.py # Data validation and business logic errors │ ├── tool_error_handling.py # MCP protocol error handling │ ├── data_loader.py # Data loading utilities │ └── image_utils.py # Image processing and conversion └── mcp/ # MCP-specific modules ├── errors.py # MCP error definitions ├── resources.py # MCP resource management └── ... ``` ### Key Design Principles 1. **Separation of Analysis and Visualization**: Each analysis tool focuses on computation, visualization is handled separately 2. **Two-Layer Error Handling**: Data validation errors vs. MCP protocol errors 3. **Pydantic Parameter Validation**: All tool parameters are validated using Pydantic models 4. **Image Object Handling**: Critical handling of matplotlib images for MCP display 5. **Resource Management**: Automatic cleanup and memory management for large datasets ## Development Workflow ### Branching Strategy - `main`: Stable release branch (production-ready) - `develop`: Development branch for new features (optional - direct to main is acceptable) - Feature branches: `feature/your-feature-name` - Bug fixes: `bugfix/issue-description` - Documentation: `docs/update-description` ### Making Changes 1. **Create a feature branch** ```bash git checkout -b feature/your-feature-name ``` 2. **Make your changes** - Follow the existing code style - Add tests for new functionality - Update documentation as needed 3. **Run tests and linting** ```bash # Run tests pytest tests/ # Format code black . isort . # Check linting flake8 . mypy chatspatial ``` 4. **Commit your changes** ```bash git add . git commit -m "feat: add new spatial analysis method" ``` 5. **Push and create a Pull Request** ```bash git push origin feature/your-feature-name ``` ## Code Style Guidelines ### Python Code Style - Follow PEP 8 - Use Black for code formatting - Use isort for import sorting - Maximum line length: 88 characters (Black default) - Use type hints where appropriate ### Commit Message Format Follow conventional commits: - `feat:` for new features - `fix:` for bug fixes - `docs:` for documentation changes - `test:` for adding tests - `refactor:` for code refactoring - `perf:` for performance improvements ### Documentation - Update README.md for new features - Add docstrings to all public functions and classes - Include examples in docstrings - Update API documentation ## Testing Guidelines ### Test Structure ChatSpatial uses a comprehensive testing strategy: ``` tests/ ├── unit/ # Unit tests for individual components │ ├── algorithms/ # Algorithm-specific tests │ ├── models/ # Pydantic model tests │ └── utils/ # Utility function tests ├── integration/ # Integration tests for workflows │ ├── test_enrichment_analysis.py │ └── test_trajectory_analysis.py ├── e2e/ # End-to-end MCP integration tests │ └── test_mcp_integration.py ├── stress_tests/ # Performance and reliability tests │ └── test_comprehensive_stress.py ├── visualization_tests/ # Image generation and display tests │ └── test_visualization_comprehensive.py └── fixtures/ # Shared test fixtures └── __init__.py ``` ### Test Categories and When to Use 1. **Unit Tests**: Individual function/method testing ```python # Test single function behavior async def test_validate_adata(): # Test data validation logic ``` 2. **Integration Tests**: Multi-component workflows ```python # Test complete analysis workflows async def test_gaston_spatial_genes_workflow(): # Test load → preprocess → analyze → visualize ``` 3. **E2E Tests**: Full MCP protocol testing ```python # Test MCP tool calls end-to-end async def test_mcp_spatial_analysis_complete(): # Test through MCP protocol ``` 4. **Stress Tests**: Large dataset and performance testing ```python # Test with realistic dataset sizes async def test_large_dataset_performance(): # Test memory usage and timeout handling ``` 5. **Visualization Tests**: Image generation testing ```python # Test image creation and format async def test_spatial_plot_generation(): # Test matplotlib → MCP Image object conversion ``` ### Critical Testing Requirements #### **Image Object Testing** **Current Architecture (2024-12):** All visualizations save to disk and return file paths (DIRECT_EMBED_THRESHOLD = 0). This avoids MCP protocol token overhead issues with embedded images. ```python # Test current image handling (file paths) async def test_image_object_handling(): """Test that visualizations return file paths correctly.""" # Test visualization tool result = await visualize_data("test_data", params) # Currently returns file path string assert isinstance(result, str) # File path to saved image assert result.startswith("file://") or result.startswith("/") assert result.endswith(".png") # Future: When DIRECT_EMBED_THRESHOLD > 0, small images may return ImageContent # See docs/IMAGE_HANDLING_ARCHITECTURE.md for details ``` #### **MCP Error Handling Testing** Test both error handling layers: ```python # Test data validation layer async def test_data_validation_errors(): with pytest.raises(ProcessingError): await analyze_with_invalid_data() # Test MCP protocol layer async def test_mcp_error_formatting(): result = await mcp_tool_with_error() assert result["isError"] == True assert "Error:" in result["content"][0]["text"] ``` ### Test Data Management #### **Synthetic Data Creation** ```python # Create realistic synthetic data for testing def create_synthetic_spatial_data(n_spots=200, n_genes=100): """Create synthetic spatial transcriptomics data.""" # Create spatial coordinates spatial_coords = generate_spatial_pattern() # Create expression data with spatial patterns X = create_spatial_expression(n_spots, n_genes, spatial_coords) # Create AnnData with proper structure adata = sc.AnnData(X) adata.obsm['spatial'] = spatial_coords adata.obs['cell_type'] = assign_cell_types(spatial_coords) return adata ``` #### **Test Data Guidelines** - **Small datasets**: <1000 spots, <500 genes for unit tests - **Medium datasets**: 1000-3000 spots for integration tests - **Large datasets**: >3000 spots only for stress tests - **Don't commit large files**: Use `.gitignore` for test data >10MB - **Reproducible data**: Set random seeds for consistent test results ### Performance Testing #### **Memory Usage Testing** ```python import psutil import os async def test_memory_usage(): """Test memory usage during analysis.""" # Get initial memory process = psutil.Process(os.getpid()) initial_memory = process.memory_info().rss # Run analysis result = await analyze_large_dataset() # Check memory increase is reasonable final_memory = process.memory_info().rss memory_increase = (final_memory - initial_memory) / 1024 / 1024 # MB assert memory_increase < 500 # Less than 500MB increase ``` #### **Timeout Testing** ```python import asyncio async def test_analysis_timeout(): """Test that long-running analysis completes within reasonable time.""" start_time = time.time() # Run analysis with timeout try: result = await asyncio.wait_for( analyze_complex_data(), timeout=300 # 5 minutes max ) assert result.success except asyncio.TimeoutError: pytest.fail("Analysis took too long (>5 minutes)") end_time = time.time() assert end_time - start_time < 300 ``` ### Test Fixtures and Utilities Create reusable test fixtures: ```python # tests/fixtures/__init__.py import pytest @pytest.fixture def sample_spatial_data(): """Standard spatial transcriptomics test data.""" return create_synthetic_spatial_data(n_spots=100, n_genes=50) @pytest.fixture def mock_context(): """Mock MCP context for testing.""" class MockContext: async def info(self, msg): pass async def warning(self, msg): pass return MockContext() @pytest.fixture def data_store(sample_spatial_data): """Standard data store for testing.""" return { "test_data": { "adata": sample_spatial_data, "name": "Test Dataset", "type": "synthetic" } } ``` ### Test Automation and CI #### **Pre-commit Testing** ```bash # Test before committing python -m pytest tests/unit/ -x # Stop on first failure python -m pytest tests/integration/ -v # Verbose integration tests python -m pytest tests/e2e/ --timeout=300 # E2E with timeout ``` #### **Coverage Requirements** ```bash # Generate coverage report pytest tests/ --cov=chatspatial --cov-report=html --cov-report=term # Minimum coverage targets: # - Unit tests: >90% coverage # - Integration tests: >80% coverage # - Overall: >85% coverage ``` #### **Test Performance Monitoring** ```bash # Run with performance profiling pytest tests/ --durations=10 # Show 10 slowest tests # Memory profiling pytest tests/ --memory-profiler # If installed ``` ### Debugging Test Failures #### **Common Test Failure Patterns** 1. **Image Display Issues**: Check Image object handling 2. **Memory Errors**: Reduce test dataset size or increase system memory 3. **Timeout Errors**: Use Cherry Studio configuration for testing 4. **Import Errors**: Check optional dependencies are installed 5. **Data Format Errors**: Verify AnnData structure matches expectations #### **Test Debugging Tools** ```python # Add debug logging to tests import logging logging.basicConfig(level=logging.DEBUG) # Use pytest debugging pytest tests/test_your_tool.py -v -s --pdb # Drop to debugger on failure # Capture stdout/stderr pytest tests/test_your_tool.py -s # Don't capture output ``` ## Adding New Features ### Adding New MCP Tools Follow this pattern for adding new analysis tools: #### 1. Create Parameter Model (`chatspatial/models/data.py`) ```python from pydantic import BaseModel, Field from typing import Optional, List class YourAnalysisParameters(BaseModel): """Parameters for your new analysis method.""" method: str = Field("default_method", description="Analysis method to use") parameter1: int = Field(10, description="Description of parameter1") parameter2: Optional[float] = Field(None, description="Optional parameter") genes: Optional[List[str]] = Field(None, description="Genes to analyze") ``` #### 2. Create Result Model (`chatspatial/models/analysis.py`) ```python from pydantic import BaseModel from typing import Optional, Dict, Any class YourAnalysisResult(BaseModel): """Result from your analysis method.""" method: str success: bool n_features_analyzed: int results_key: Optional[str] = None # Key in adata.uns where results are stored statistics: Optional[Dict[str, Any]] = None message: str ``` #### 3. Implement Tool Logic (`chatspatial/tools/your_tool.py`) ```python from chatspatial.utils.error_handling import validate_adata, ProcessingError from chatspatial.utils.tool_error_handling import mcp_tool_error_handler from chatspatial.models.data import YourAnalysisParameters from chatspatial.models.analysis import YourAnalysisResult async def your_analysis_function( data_id: str, data_store: dict, params: YourAnalysisParameters, context ) -> YourAnalysisResult: """ Implement your analysis method. Args: data_id: Dataset identifier data_store: Data storage dictionary params: Analysis parameters context: MCP context for logging Returns: YourAnalysisResult: Analysis results """ # Validate input data adata = data_store[data_id]["adata"] validate_adata(adata, { 'obs': ['required_column'], # Add required columns 'obsm': ['spatial'] # Spatial coordinates usually required }, check_spatial=True) # Implement your analysis logic try: # Your analysis code here results = run_your_analysis(adata, params) # Store results in adata adata.uns['your_analysis_results'] = results return YourAnalysisResult( method=params.method, success=True, n_features_analyzed=len(results), results_key='your_analysis_results', message=f"Analysis completed successfully with {len(results)} features" ) except Exception as e: raise ProcessingError(f"Analysis failed: {str(e)}") ``` #### 4. Register MCP Tool (`chatspatial/server.py`) ```python from chatspatial.tools.your_tool import your_analysis_function @mcp.tool() @mcp_tool_error_handler() async def your_analysis_tool( data_id: str, params: Optional[Dict[str, Any]] = None, context: Context = None ) -> YourAnalysisResult: """ Brief description of your analysis tool. Args: data_id: Dataset identifier from load_data params: Analysis parameters (optional) Returns: Analysis results """ # Parse parameters analysis_params = YourAnalysisParameters(**(params or {})) # Run analysis return await your_analysis_function( data_id, data_store, analysis_params, context ) ``` #### 5. Add Tests Create comprehensive tests in `tests/`: ```python # tests/test_your_tool.py import pytest from chatspatial.tools.your_tool import your_analysis_function from chatspatial.models.data import YourAnalysisParameters @pytest.fixture def sample_data(): # Create or load sample data for testing pass async def test_your_analysis_basic(sample_data): """Test basic functionality.""" params = YourAnalysisParameters(method="basic") result = await your_analysis_function("test_data", sample_data, params, None) assert result.success assert result.n_features_analyzed > 0 assert result.results_key in sample_data["test_data"]["adata"].uns async def test_your_analysis_error_handling(sample_data): """Test error handling.""" # Test with invalid parameters params = YourAnalysisParameters(parameter1=-1) # Invalid value with pytest.raises(ProcessingError): await your_analysis_function("test_data", sample_data, params, None) ``` ### Adding New Visualization Types To add support for visualizing your analysis results: #### 1. Add Plot Type to Visualization Tool (`chatspatial/tools/visualization.py`) ```python def create_your_analysis_plot(adata, params): """Create visualization for your analysis results.""" # Get your results from adata.uns results = adata.uns.get(params.results_key, None) if results is None: raise ValueError(f"Results not found: {params.results_key}") # Create your plot fig, ax = plt.subplots(figsize=params.figure_size) # Your plotting logic here return fig ``` #### 2. Register Plot Type in the Main Visualization Function Add your plot type to the main `visualize_data` function's plot type mapping. ### Spatial Analysis Methods For new spatial analysis methods (like adding support for a new spatial variable genes method): 1. **Study Existing Patterns**: Look at `spatial_genes.py` for GASTON, SpatialDE, SPARK-X implementations 2. **Follow Error Handling**: Use both data validation and MCP error handling layers 3. **Parameter Validation**: Create comprehensive Pydantic models 4. **Result Standardization**: Follow existing result model patterns 5. **Testing**: Add unit tests, integration tests, and real data tests ### Method Integration Checklist When adding a new method: - [ ] Parameter model created with proper validation - [ ] Result model follows existing patterns - [ ] Implementation handles all error cases - [ ] MCP tool registered with proper decorators - [ ] Comprehensive tests added - [ ] Documentation updated - [ ] Example usage provided - [ ] Integration with visualization tool (if applicable) - [ ] Memory usage optimized for large datasets - [ ] Optional dependencies properly handled ### Dependencies - Add new dependencies to `pyproject.toml` - Use optional dependencies for specialized features - Document installation requirements ## Documentation ### API Documentation - Use Google-style docstrings - Include parameter types and descriptions - Provide usage examples - Document return values ### User Documentation - Update README.md for new features - Add examples to `examples/` directory - Create tutorials for complex workflows ## Issue Reporting ### Bug Reports - Use the bug report template - Include minimal reproducible example - Provide environment information - Include error messages and stack traces ### Feature Requests - Use the feature request template - Describe the use case clearly - Suggest implementation approach - Consider backwards compatibility ## Review Process ### Pull Request Requirements - All tests must pass - Code coverage should not decrease - Documentation must be updated - Follow the PR template - Get approval from maintainers ### Review Criteria - Code quality and style - Test coverage and quality - Documentation completeness - Backwards compatibility - Performance impact ## Community Guidelines ### Code of Conduct - Be respectful and inclusive - Focus on constructive feedback - Help newcomers learn - Acknowledge contributions - Provide clear, reproducible examples when reporting issues - Respect the time constraints of maintainers ### Communication Channels 1. **GitHub Issues**: Bug reports, feature requests, method proposals 2. **GitHub Discussions**: General questions, usage help, method comparisons 3. **Pull Request Reviews**: Code-specific feedback and suggestions 4. **Documentation**: Update guides and examples to help future contributors ### When Contributing #### **Before Starting Work** 1. **Check existing issues** to avoid duplication 2. **Discuss major changes** in GitHub issues first 3. **Review the USER_GUIDE.md** to understand current capabilities 4. **Check method compatibility** with existing analysis workflows #### **Quality Standards** - **Code must be production-ready**: All methods should be thoroughly tested - **Follow existing patterns**: Study current implementations before adding new ones - **Document everything**: Include comprehensive docstrings and usage examples - **Test comprehensively**: Unit tests, integration tests, and real-data validation ### Common Pitfalls and How to Avoid Them #### **1. Image Display Issues** **Problem**: Modifying image handling can affect visualization delivery **Solution**: - Understand the current architecture (DIRECT_EMBED_THRESHOLD = 0) - Always test image display end-to-end with MCP tools - Read `/docs/IMAGE_HANDLING_ARCHITECTURE.md` before touching visualization code - Don't modify the type-aware error handling in `tool_error_handling.py` without testing #### **2. Memory Issues with Large Datasets** **Problem**: New methods consume too much memory **Solution**: - Test with datasets >3000 spots during development - Implement subsampling options for large datasets - Use memory profiling during testing - Provide clear memory requirements in documentation #### **3. Optional Dependency Hell** **Problem**: New methods introduce complex dependency conflicts **Solution**: - Use optional dependencies in `pyproject.toml` - Provide clear installation instructions - Test installation in clean environments - Handle import errors gracefully with informative messages #### **4. Breaking Existing Workflows** **Problem**: Changes break existing analysis patterns **Solution**: - Maintain backward compatibility for parameter models - Test integration with existing workflows - Follow the two-step analysis pattern (analyze → visualize) - Update examples and documentation when changing APIs #### **5. Inadequate Error Handling** **Problem**: New tools don't follow the two-layer error handling pattern **Solution**: - Use `validate_adata()` for data validation (business logic layer) - Use `@mcp_tool_error_handler()` for MCP tools (protocol layer) - Provide informative error messages - Test error conditions comprehensively ### Method Integration Best Practices #### **When Adding New Analysis Methods** 1. **Research Integration**: Study how similar tools (GASTON, LIANA+, SpaGCN) are integrated 2. **Parameter Design**: Create comprehensive Pydantic models with validation 3. **Result Standardization**: Follow existing result model patterns 4. **Visualization Support**: Ensure results can be visualized using existing plot types 5. **Documentation**: Provide usage examples and method comparison guidance #### **Performance Considerations** - **Fast methods** (<2 min): Can be used freely in interactive workflows - **Moderate methods** (2-10 min): Should provide progress feedback - **Slow methods** (>10 min): Must support parameter optimization for speed - **Memory-intensive methods**: Must handle large datasets gracefully #### **Testing New Methods** 1. **Synthetic data testing**: Create controlled test cases 2. **Real data validation**: Test with actual spatial transcriptomics datasets 3. **Comparison testing**: Compare results with reference implementations 4. **Performance testing**: Measure memory usage and execution time 5. **Integration testing**: Test full workflows including visualization ### Contributing Workflow Checklist Before submitting a pull request: - [ ] **Code follows existing patterns** and uses established error handling - [ ] **All tests pass** including unit, integration, and visualization tests - [ ] **Documentation is complete** with usage examples and parameter descriptions - [ ] **Method comparison guide updated** if adding new analysis capability - [ ] **Memory usage tested** with realistic dataset sizes - [ ] **Optional dependencies handled** correctly with clear installation instructions - [ ] **Backward compatibility maintained** for existing parameter models - [ ] **Image handling tested** if visualization is involved - [ ] **Performance benchmarked** and optimization options provided - [ ] **Real data validation** completed for new analysis methods ### Common Review Comments and How to Address Them #### **"This breaks the image display"** - **Cause**: Modified Image object handling incorrectly - **Fix**: Revert to returning raw Image objects, never wrap them #### **"Memory usage is too high"** - **Cause**: New method doesn't handle large datasets efficiently - **Fix**: Add subsampling options, implement memory-efficient algorithms #### **"Error handling doesn't follow project patterns"** - **Cause**: Not using the two-layer error handling system - **Fix**: Use `validate_adata()` and `@mcp_tool_error_handler()` appropriately #### **"Tests are insufficient"** - **Cause**: Missing critical test scenarios - **Fix**: Add unit tests, integration tests, error condition tests, and performance tests #### **"Documentation is incomplete"** - **Cause**: Missing usage examples or parameter descriptions - **Fix**: Add comprehensive docstrings, usage examples, and update user guides ## Release Process ### Version Numbering - Follow semantic versioning (SemVer) - Major: Breaking changes - Minor: New features, backwards compatible - Patch: Bug fixes, backwards compatible ### Release Checklist - Update version in `pyproject.toml` - Update CHANGELOG.md - Create release notes - Tag release in Git - Publish to PyPI (when ready) ## Getting Help - Check existing issues and documentation - Ask questions in GitHub discussions - Contact maintainers for complex issues Thank you for contributing to ChatSpatial!

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cafferychen777/ChatSpatial'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CONTRIBUTING.md•26.4 KiB