Session Buddy

Overview Schema Related Servers Score Discussions

session-buddy
docs
archive
weekly-progress

WEEK8_PLANNING.md•18.4 KiB

# Week 8 - Test Coverage & Quality Improvement **Status:** Planning **Start Date:** 2025-10-29 **Estimated Duration:** 5-7 days **Objective:** Increase test coverage from 14.4% to 80%+ and fix test isolation issues ______________________________________________________________________ ## Executive Summary Following the successful Week 7 ACB DI refactoring (production-ready, 99.6% test pass rate), Week 8 focuses on addressing the critical gap identified in the checkpoint: **test coverage at 14.4% (target: 80%+)**. **Key Challenge:** While 954/978 tests pass (99.6%), overall code coverage is only 14.4%, indicating large portions of the codebase lack comprehensive tests. ______________________________________________________________________ ## Current State Assessment ### Test Suite Status (from Week 7 completion) **Unit Tests:** - **Total tests:** 978 unit tests - **Passing:** 954 (97.5%) - **Failing:** 4-8 (test isolation issues) - **Skipped:** 20 - **Coverage:** 14.4% (critical gap) **DI Infrastructure (Week 7):** - ✅ 25/25 DI tests passing (100%) - ✅ SessionPaths fully tested - ✅ Direct container access pattern validated **Known Failing Tests:** ``` FAILED tests/unit/test_di_container.py::test_configure_registers_singletons FAILED tests/unit/test_instance_managers.py::test_get_app_monitor_registers_singleton FAILED tests/unit/test_instance_managers.py::test_get_llm_manager_uses_di_cache FAILED tests/unit/test_instance_managers.py::test_serverless_manager_uses_config FAILED tests/unit/test_server.py::TestServerQualityScoring::test_calculate_quality_score_with_no_args FAILED tests/unit/test_server.py::TestServerQualityScoring::test_quality_score_returns_numeric FAILED tests/unit/test_server.py::TestServerConcurrency::test_concurrent_quality_scoring FAILED tests/unit/test_session_manager.py::TestSessionLifecycleManagerHandoffDocumentation::test_read_previous_session_info ``` **Root Cause:** Test isolation issues (tests pass individually but fail in suite due to shared DI state). ______________________________________________________________________ ## Week 8 Objectives ### Primary Goal **Increase test coverage from 14.4% to 80%+** through systematic test expansion ### Secondary Goals 1. Fix 4-8 test isolation issues 1. Establish test coverage baselines by module 1. Create test coverage improvement plan 1. Document testing patterns and best practices ### Success Criteria - ✅ Test coverage ≥80% (target: 85%+) - ✅ All test isolation issues resolved - ✅ Test pass rate maintained at ≥99% - ✅ No regressions introduced - ✅ Comprehensive testing documentation created ______________________________________________________________________ ## Coverage Gap Analysis ### Modules with Low Coverage (\<50%) Based on checkpoint analysis, priority modules needing coverage: **High Priority (Core Functionality):** 1. `server.py` (~3,500+ lines) - **Critical gap** - MCP tool registration - Session lifecycle management - Quality scoring algorithms 1. `reflection_tools.py` - **Critical gap** - DuckDB database operations - Embedding generation - Search functionality 1. `crackerjack_integration.py` - **Critical gap** - Command parsing - Quality metric aggregation - Test result analysis **Medium Priority (Advanced Features):** 4\. `tools/` modules - **Moderate gap** - session_tools.py - memory_tools.py - crackerjack_tools.py - llm_tools.py - team_tools.py 5. `core/session_manager.py` - **Moderate gap** - Session state management - Lifecycle coordination **Lower Priority (Extended Features):** 6\. `multi_project_coordinator.py` 7\. `token_optimizer.py` 8\. `search_enhanced.py` 9\. `interruption_manager.py` 10\. `serverless_mode.py` 11\. `app_monitor.py` 12\. `natural_scheduler.py` 13\. `worktree_manager.py` ______________________________________________________________________ ## Week 8 Implementation Plan ### Day 0: Analysis & Planning (2 hours) **Objectives:** - Generate detailed coverage report by module - Identify specific untested code paths - Create prioritized test implementation plan - Set up coverage tracking infrastructure **Tasks:** 1. Run comprehensive coverage analysis ```bash coverage run -m pytest tests/unit/ coverage report --sort=cover coverage html # For detailed inspection ``` 1. Document coverage gaps by module - Create `docs/WEEK8_COVERAGE_BASELINE.md` - List specific functions/classes needing tests - Prioritize by criticality 1. Set up coverage tracking - Configure pytest-cov with fail-under threshold - Create coverage improvement tracking spreadsheet **Deliverables:** - `docs/WEEK8_PLANNING.md` (this document) - `docs/WEEK8_COVERAGE_BASELINE.md` - Coverage HTML report (`htmlcov/index.html`) ______________________________________________________________________ ### Day 1: Fix Test Isolation Issues (4 hours) **Objective:** Resolve 4-8 failing tests caused by DI state pollution **Root Cause Analysis:** - Tests share global bevy DI container state - Previous tests leave registered instances - SessionPaths and singleton services not properly cleaned up **Solution Pattern:** ```text @pytest.fixture(autouse=True) def reset_di_container(): """Reset DI state after each test.""" yield container = get_container() # Clean up all registered instances for cls in [ SessionPaths, ApplicationMonitor, LLMManager, ServerlessSessionManager, # ... other singletons ]: with suppress(KeyError): container.instances.pop(cls, None) ``` **Tasks:** 1. Add autouse fixture for DI cleanup in `tests/conftest.py` 1. Update failing tests to use fixture 1. Verify all 978 tests pass in full suite 1. Document DI test isolation pattern **Expected Outcome:** 978/978 tests passing (100%) ______________________________________________________________________ ### Day 2: Server.py Core Coverage (6-8 hours) **Objective:** Test core MCP server functionality (highest priority) **Target Modules:** 1. FastMCP tool registration 1. Session lifecycle (start, checkpoint, end, status) 1. Quality scoring algorithm 1. Git integration **Testing Strategy:** - Use `MockMCP` server pattern from existing tests - Mock external dependencies (git, filesystem, crackerjack) - Test both success and error paths - Include edge cases (empty projects, missing dependencies) **Example Test Structure:** ```python class TestServerSessionLifecycle: """Test complete session lifecycle workflows.""" async def test_start_initializes_session(self, mock_mcp_server): result = await mock_mcp_server.call_tool("start") assert result["success"] is True assert "session_id" in result async def test_checkpoint_creates_git_commit(self, mock_mcp_server, tmp_git_repo): await mock_mcp_server.call_tool("start", working_directory=tmp_git_repo) result = await mock_mcp_server.call_tool("checkpoint") # Verify git commit created assert result["success"] is True assert "commit_sha" in result ``` **Target Coverage:** 60%+ for server.py ______________________________________________________________________ ### Day 3: Reflection Tools Coverage (6-8 hours) **Objective:** Test memory and search functionality **Target Modules:** 1. `ReflectionDatabase` class 1. Embedding generation and fallback 1. Semantic search with vector similarity 1. Text search fallback 1. Database schema creation **Testing Strategy:** - Use in-memory DuckDB for fast tests - Mock ONNX model for embedding tests - Test both embedding and fallback modes - Verify search ranking and relevance **Example Test Structure:** ```python class TestReflectionDatabaseSearch: """Test semantic and text search functionality.""" async def test_semantic_search_with_embeddings(self, reflection_db): # Store conversations with embeddings await reflection_db.store_conversation("Python async patterns") await reflection_db.store_conversation("JavaScript promises") # Search for related content results = await reflection_db.search_reflections("async programming") assert len(results) > 0 assert "Python" in results[0]["content"] async def test_text_search_fallback_when_no_embeddings(self, reflection_db_no_onnx): # Verify fallback to text search await reflection_db_no_onnx.store_conversation("Python async patterns") results = await reflection_db_no_onnx.search_reflections("async") assert len(results) > 0 ``` **Target Coverage:** 70%+ for reflection_tools.py ______________________________________________________________________ ### Day 4: Crackerjack Integration Coverage (5-6 hours) **Objective:** Test code quality integration layer **Target Modules:** 1. Command parsing and validation 1. Quality metric aggregation 1. Test result analysis 1. Command history and learning **Testing Strategy:** - Mock crackerjack command output - Test various output formats - Verify metric calculation accuracy - Test error handling **Example Test Structure:** ```python class TestCrackerjackIntegration: """Test crackerjack command integration.""" def test_parse_quality_output(self, mock_crackerjack_output): integration = CrackerjackIntegration() result = integration.parse_output(mock_crackerjack_output) assert result["quality_score"] == 85 assert result["tests_passed"] == 42 assert result["coverage"] == 90.5 def test_command_execution_with_timeout(self): integration = CrackerjackIntegration() result = integration.execute_command("test", timeout=30) assert result["success"] is True assert result["execution_time"] < 30.0 ``` **Target Coverage:** 75%+ for crackerjack_integration.py ______________________________________________________________________ ### Day 5: Tools Module Coverage (6-8 hours) **Objective:** Test MCP tool implementations **Target Modules:** - `tools/session_tools.py` - `tools/memory_tools.py` - `tools/crackerjack_tools.py` - `tools/llm_tools.py` - `tools/team_tools.py` **Testing Strategy:** - Test each tool's happy path - Test error handling and validation - Test parameter combinations - Verify return value structures **Target Coverage:** 70%+ for all tools/ modules ______________________________________________________________________ ### Day 6: Session Manager & Utils Coverage (5-6 hours) **Objective:** Test session state management and utilities **Target Modules:** 1. `core/session_manager.py` 1. `utils/git_operations.py` 1. `utils/logging.py` 1. `utils/quality_utils_v2.py` **Testing Strategy:** - Test session lifecycle state transitions - Mock git operations - Verify logging output - Test quality scoring V2 algorithm **Target Coverage:** 75%+ for core/ and utils/ modules ______________________________________________________________________ ### Day 7: Documentation & Verification (3-4 hours) **Objective:** Final verification and documentation **Tasks:** 1. Run full test suite with coverage 1. Generate final coverage report 1. Create Week 8 completion document 1. Update testing documentation **Deliverables:** - `docs/WEEK8_COMPLETION.md` - `docs/developer/TESTING_BEST_PRACTICES.md` - Updated `docs/developer/ARCHITECTURE.md` (testing section) - Coverage report showing ≥80% coverage ______________________________________________________________________ ## Testing Patterns & Best Practices ### Pattern 1: DI Test Isolation ```text @pytest.fixture(autouse=True) def reset_di_container(): """Ensure clean DI state for each test.""" yield container = get_container() for cls in ALL_SINGLETON_CLASSES: with suppress(KeyError): container.instances.pop(cls, None) ``` ### Pattern 2: Mock MCP Server ```text @pytest.fixture async def mock_mcp_server(tmp_path): """Create isolated MCP server for testing.""" from session_buddy.server import mcp # Configure with test environment os.environ["HOME"] = str(tmp_path) yield mcp # Cleanup del os.environ["HOME"] ``` ### Pattern 3: Async Database Testing ```text @pytest.fixture async def reflection_db(tmp_path): """Create in-memory reflection database.""" db_path = tmp_path / "test_reflections.duckdb" async with ReflectionDatabase(db_path) as db: await db._ensure_tables() yield db ``` ### Pattern 4: Mock External Dependencies ```python @pytest.fixture def mock_crackerjack_command(monkeypatch): """Mock crackerjack command execution.""" def mock_run(cmd, *args, **kwargs): return CompletedProcess( args=cmd, returncode=0, stdout="Quality: 85/100\nTests: 42 passed", stderr="", ) monkeypatch.setattr("subprocess.run", mock_run) ``` ______________________________________________________________________ ## Risk Mitigation ### Risk 1: Test Suite Timeout **Mitigation:** Use pytest-xdist for parallel execution, mark slow tests ### Risk 2: Coverage Plateau **Mitigation:** Focus on critical paths first, accept lower coverage for rarely-used features ### Risk 3: Test Brittleness **Mitigation:** Use fixtures for setup, avoid hardcoded paths, mock external dependencies ### Risk 4: Time Overrun **Mitigation:** Prioritize high-impact modules, defer lower-priority modules if needed ______________________________________________________________________ ## Success Metrics ### Quantitative - **Coverage:** 14.4% → 80%+ (target: 85%) - **Test Pass Rate:** ≥99% maintained - **Tests Added:** ~200-300 new tests - **Code Lines Tested:** ~15,000+ additional lines ### Qualitative - All critical paths have test coverage - Test isolation issues resolved - Testing patterns documented - Developer confidence improved ______________________________________________________________________ ## Time Estimate | Day | Focus | Estimated Hours | Notes | |-----|-------|----------------|-------| | Day 0 | Analysis & Planning | 2h | Coverage baseline | | Day 1 | Test Isolation Fixes | 4h | DI cleanup | | Day 2 | Server.py Coverage | 6-8h | Core MCP server | | Day 3 | Reflection Tools | 6-8h | Memory & search | | Day 4 | Crackerjack Integration | 5-6h | Quality metrics | | Day 5 | Tools Module Coverage | 6-8h | MCP tools | | Day 6 | Session Manager & Utils | 5-6h | State management | | Day 7 | Documentation & Verification | 3-4h | Final docs | | **Total** | | **37-48h** | **~5-7 days** | ______________________________________________________________________ ## Dependencies & Prerequisites **From Week 7:** - ✅ SessionPaths frozen dataclass - ✅ Direct container access pattern - ✅ DI test fixtures established - ✅ All DI tests passing (25/25) **Required Tools:** - pytest ≥7.0 - pytest-asyncio ≥0.21 - pytest-cov ≥4.0 - coverage ≥7.0 - hypothesis ≥6.70 (property-based testing) ______________________________________________________________________ ## Deliverables Checklist ### Documentation - [ ] `docs/WEEK8_PLANNING.md` (this document) - [ ] `docs/WEEK8_COVERAGE_BASELINE.md` (Day 0) - [ ] `docs/WEEK8_DAY1_PROGRESS.md` (test isolation fixes) - [ ] `docs/WEEK8_DAY2_PROGRESS.md` (server.py coverage) - [ ] `docs/WEEK8_DAY3_PROGRESS.md` (reflection tools) - [ ] `docs/WEEK8_DAY4_PROGRESS.md` (crackerjack integration) - [ ] `docs/WEEK8_DAY5_PROGRESS.md` (tools modules) - [ ] `docs/WEEK8_DAY6_PROGRESS.md` (session manager & utils) - [ ] `docs/WEEK8_COMPLETION.md` (final summary) - [ ] `docs/developer/TESTING_BEST_PRACTICES.md` (patterns guide) ### Code Artifacts - [ ] Test isolation fixture in `tests/conftest.py` - [ ] ~200-300 new unit tests across modules - [ ] Mock fixtures for external dependencies - [ ] Test data factories (if needed) ### Reports - [ ] Coverage baseline report (Day 0) - [ ] Daily coverage progress reports - [ ] Final coverage report (≥80%) - [ ] Test execution time report ______________________________________________________________________ ## Integration with Previous Work **Week 5 (Testing Phase Complete):** - Established test infrastructure - Created comprehensive test suite baseline - Set testing patterns **Week 6 (DI Environment Handling):** - Fixed environment variable handling - Improved test isolation patterns **Week 7 (ACB DI Refactoring):** - Type-safe SessionPaths configuration - Direct container access pattern - 25 comprehensive DI tests **Week 8 builds on this foundation** by expanding coverage to all core modules. ______________________________________________________________________ ## Future Work (Post-Week 8) **Week 9 Candidates:** 1. **Performance Optimization** (if coverage ≥80% achieved) - Profile hot paths - Optimize database queries - Reduce memory usage 1. **Integration Testing** (if time permits) - End-to-end workflow tests - Multi-component integration tests - Performance benchmarking 1. **Advanced Features Testing** (lower priority) - Multi-project coordination - Serverless mode - App monitoring - Natural scheduler - Worktree manager ______________________________________________________________________ ## Notes & Considerations **Coverage Philosophy:** - Focus on critical paths and core functionality first - Accept lower coverage for rarely-used features - Prioritize test quality over raw coverage numbers - Aim for meaningful tests, not just line coverage **Test Development Approach:** - Write tests in parallel with coverage analysis - Test behavior, not implementation details - Use property-based testing (hypothesis) for complex logic - Mock external dependencies consistently **Performance Considerations:** - Use pytest-xdist for parallel execution - Mark slow tests with `@pytest.mark.slow` - Use in-memory databases for fast tests - Cache expensive fixtures ______________________________________________________________________ ## Conclusion Week 8 represents a critical quality improvement initiative to address the test coverage gap (14.4% → 80%+). By systematically expanding test coverage across all core modules and resolving test isolation issues, we'll establish a robust testing foundation for future development. **Key Success Factors:** 1. Prioritized approach (critical modules first) 1. Clear testing patterns established 1. Comprehensive documentation 1. Maintained test pass rate ≥99% **Estimated Completion:** 5-7 days (~37-48 hours) ______________________________________________________________________ **Created:** 2025-10-29 **Author:** Claude Code + Les **Project:** session-buddy **Phase:** Week 8 Planning - Test Coverage Improvement

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lesleslie/session-buddy'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

WEEK8_PLANNING.md•18.4 KiB