Session Buddy

Overview Schema Related Servers Score Discussions

session-buddy
docs
archive
weekly-progress

WEEK5_TESTING_REVIEW.md•40.1 KiB

# Week 5 Testing Implementation Review **Pytest Best Practices Assessment** **Review Date:** 2025-10-29 **Modules Reviewed:** 79 tests across 4 modules **Test Status:** ✅ All 79 tests passing ______________________________________________________________________ ## Executive Summary **Overall Score: 7.5/10** (Good with significant improvement opportunities) Week 5 testing demonstrates solid foundational patterns with consistent test organization and proper async/await handling. However, there are substantial opportunities to elevate test quality through advanced pytest features, particularly parametrization, property-based testing with Hypothesis, and more sophisticated fixture usage. ______________________________________________________________________ ## Detailed Assessment ### 1. Pytest Patterns Score: 7/10 **Strengths:** - ✅ **Excellent test organization** with descriptive class grouping - ✅ **Proper async/await patterns** - all `@pytest.mark.asyncio` decorators applied correctly - ✅ **Clear test names** following "should_describe_behavior" convention - ✅ **Consistent structure** across all four test modules **Weaknesses:** - ❌ **Zero parametrization** - significant duplication in loop-based tests - ❌ **Limited fixture reuse** - heavy mock setup duplication - ❌ **No fixture factories** - missing powerful pattern for test data generation - ❌ **Weak assertion patterns** - many assertions test implementation details **Examples:** **Good Pattern (Clear Organization):** ```python class TestConversationSummarizer: """Test conversation summarization strategies.""" def test_extractive_summarization(self) -> None: """Should extract important sentences from conversation.""" ``` **Anti-Pattern (Missing Parametrization):** ```python # ❌ Current approach - loop in test def test_project_dependency_types(self) -> None: """Should validate ProjectDependency types.""" for dep_type in ["uses", "extends", "references", "shares_code"]: dep = ProjectDependency( id=f"dep-{dep_type}", source_project="project-a", target_project="project-b", dependency_type=dep_type, description=f"Test {dep_type} dependency", ) assert dep.dependency_type == dep_type # ✅ Better approach - pytest parametrization @pytest.mark.parametrize("dep_type", ["uses", "extends", "references", "shares_code"]) def test_project_dependency_types(dep_type: str) -> None: """Should validate ProjectDependency types.""" dep = ProjectDependency( id=f"dep-{dep_type}", source_project="project-a", target_project="project-b", dependency_type=dep_type, description=f"Test {dep_type} dependency", ) assert dep.dependency_type == dep_type ``` ______________________________________________________________________ ### 2. Test Quality Score: 6.5/10 **Strengths:** - ✅ **Good coverage breadth** - 79 tests covering core functionality - ✅ **Edge case awareness** - tests for empty data, missing fields - ✅ **Error path testing** - validates error handling scenarios **Weaknesses:** - ❌ **Mock over-reliance** - 95%+ tests use heavy mocking - ❌ **Weak assertions** - testing implementation details instead of behavior - ❌ **Missing integration depth** - tests isolated units but not interactions - ❌ **Limited boundary testing** - few tests explore data boundary conditions **Examples:** **Anti-Pattern (Testing Implementation Details):** ```python # ❌ Brittle - tests database call count instead of behavior assert mock_db.conn.execute.call_count >= 1 # ✅ Better - test actual outcome groups = await coordinator.get_project_groups() assert len(groups) == 1 assert groups[0].name == "Test Group" ``` **Anti-Pattern (Over-Mocking):** ```python # ❌ Current - mocks hide actual behavior mock_db = MagicMock() mock_db.conn = MagicMock() mock_db.conn.execute = MagicMock( return_value=MagicMock(fetchall=MagicMock(return_value=[])) ) # ✅ Better - use real database with temp fixture async def test_get_project_groups_empty(reflection_db: ReflectionDatabase): """Should return empty list when no groups exist.""" coordinator = MultiProjectCoordinator(reflection_db) groups = await coordinator.get_project_groups() assert groups == [] ``` ______________________________________________________________________ ### 3. Improvement Opportunities (Prioritized) #### **Priority 1: Critical Improvements** ##### A. Add Parametrization (Estimated Impact: 40% code reduction) **Current Problem:** Test loops and duplication ```python # ❌ test_memory_optimizer.py line 127-131 for strategy in ["extractive", "template_based", "keyword_based"]: summary = summarizer.summarize_conversation(content, strategy) assert isinstance(summary, str) assert len(summary) > 0 ``` **Solution:** ```python @pytest.mark.parametrize( "strategy", [ pytest.param("extractive", id="extractive-strategy"), pytest.param("template_based", id="template-strategy"), pytest.param("keyword_based", id="keyword-strategy"), ], ) def test_summarize_conversation_with_strategy(strategy: str) -> None: """Should use specified summarization strategy.""" summarizer = ConversationSummarizer() content = "Test conversation with function implementation and error handling." summary = summarizer.summarize_conversation(content, strategy) assert isinstance(summary, str) assert len(summary) > 0 ``` **Files to Update:** - `test_multi_project_coordinator.py`: Lines 43-51, 58-66 (dependency types, link types) - `test_memory_optimizer.py`: Lines 127-131 (summarization strategies) - `test_app_monitor.py`: Lines 138-145 (multiple file events) - `test_serverless_mode.py`: Lines 198-202 (session filtering) ##### B. Create Fixture Factories (Estimated Impact: 60% setup reduction) **Current Problem:** Repeated mock setup ```python # ❌ Repeated in every test mock_db = MagicMock() mock_db.conn = MagicMock() mock_db.conn.execute = MagicMock() coordinator = MultiProjectCoordinator(mock_db) ``` **Solution:** ```python # conftest.py @pytest.fixture def mock_coordinator_db(): """Factory for coordinator database with configurable responses.""" def _create(fetchall_return=None, fetchone_return=None): mock_db = MagicMock() mock_db.conn = MagicMock() if fetchall_return is not None: mock_db.conn.execute = MagicMock( return_value=MagicMock(fetchall=MagicMock(return_value=fetchall_return)) ) elif fetchone_return is not None: mock_db.conn.execute = MagicMock( return_value=MagicMock(fetchone=MagicMock(return_value=fetchone_return)) ) else: mock_db.conn.execute = MagicMock() return mock_db return _create # Usage in test def test_get_project_groups_empty(mock_coordinator_db) -> None: """Should return empty list when no groups exist.""" db = mock_coordinator_db(fetchall_return=[]) coordinator = MultiProjectCoordinator(db) groups = await coordinator.get_project_groups() assert groups == [] ``` ##### C. Reduce Mock Over-Reliance (Estimated Impact: 2x confidence increase) **Current Problem:** Heavy mocking hides integration bugs ```python # ❌ test_serverless_mode.py - entirely mocked mock_cache = AsyncMock() mock_cache.set = AsyncMock() mock_cache.get = AsyncMock(return_value=None) storage = ACBCacheStorage(mock_cache, namespace="test") ``` **Solution:** ```python # ✅ Use real ACBCacheStorage with in-memory cache @pytest.fixture async def memory_cache_storage(): """Provide real ACBCacheStorage with in-memory backend.""" from aiocache import Cache cache = Cache(Cache.MEMORY) storage = ACBCacheStorage(cache, namespace="test") yield storage await cache.clear() async def test_store_and_retrieve_session(memory_cache_storage): """Should store and retrieve session through real cache.""" session = SessionState( session_id="test-123", user_id="user-1", project_id="project-1", created_at="2025-01-01T12:00:00", last_activity="2025-01-01T12:00:00", ) # Test real storage behavior stored = await memory_cache_storage.store_session(session, ttl_seconds=60) assert stored is True # Test real retrieval behavior retrieved = await memory_cache_storage.retrieve_session("test-123") assert retrieved is not None assert retrieved.session_id == "test-123" ``` #### **Priority 2: High-Value Additions** ##### D. Add Hypothesis Property-Based Testing **Use Case 1: Multi-Project Coordinator** ```python from hypothesis import given, strategies as st from hypothesis.stateful import RuleBasedStateMachine, rule, initialize, invariant # Property: Project groups should always maintain unique project IDs @given( st.lists( st.tuples(st.text(min_size=1), st.lists(st.text(min_size=1), min_size=1)), min_size=1, max_size=10, ) ) async def test_project_groups_maintain_uniqueness(group_data): """Property: All projects in a group should be unique.""" mock_db = MagicMock() mock_db.conn = MagicMock() mock_db.conn.execute = MagicMock() coordinator = MultiProjectCoordinator(mock_db) for name, projects in group_data: group = await coordinator.create_project_group( name=name, projects=projects, description="Test" ) # Property: No duplicate projects in group assert len(group.projects) == len(set(group.projects)) # Property: Session links should be bidirectional @given( st.text(min_size=1, max_size=50), st.text(min_size=1, max_size=50), st.sampled_from(["related", "continuation", "reference", "dependency"]), ) async def test_session_links_are_queryable(session_a, session_b, link_type): """Property: Links created between sessions should be queryable from both directions.""" mock_db = MagicMock() mock_db.conn = MagicMock() mock_db.conn.execute = MagicMock() coordinator = MultiProjectCoordinator(mock_db) # Create link link = await coordinator.link_sessions( source_session_id=session_a, target_session_id=session_b, link_type=link_type, context="Test link", ) # Property: Link should have both source and target assert link.source_session_id == session_a assert link.target_session_id == session_b ``` **Use Case 2: Memory Optimizer** ```python from hypothesis import given, strategies as st, assume # Custom strategy for conversation data @st.composite def conversation_data(draw): """Generate realistic conversation data.""" content = draw(st.text(min_size=10, max_size=1000)) timestamp = draw( st.datetimes(min_value=datetime(2020, 1, 1), max_value=datetime(2025, 12, 31)) ) project = draw(st.text(min_size=1, max_size=50)) assume(content.strip()) # Ensure non-empty content return { "id": str(uuid.uuid4()), "content": content, "project": project, "timestamp": timestamp.isoformat(), "metadata": {}, } # Property: Importance score should always be between 0 and 1 @given(conversation_data()) def test_importance_score_bounds(conversation): """Property: Importance score should always be normalized between 0 and 1.""" manager = RetentionPolicyManager() score = manager.calculate_importance_score(conversation) assert 0.0 <= score <= 1.0 # Property: Clustering should preserve conversation count @given(st.lists(conversation_data(), min_size=2, max_size=20)) def test_clustering_preserves_conversations(conversations): """Property: Clustering should not lose any conversations.""" clusterer = ConversationClusterer() clusters = clusterer.cluster_conversations(conversations) # Count total conversations in clusters total_clustered = sum(len(cluster) for cluster in clusters) # Property: All conversations should be in exactly one cluster assert total_clustered == len(conversations) # Property: Compression should reduce size @given(st.lists(conversation_data(), min_size=10, max_size=100)) async def test_compression_reduces_size(conversations): """Property: Memory compression should reduce total storage size.""" mock_db = MagicMock() mock_db.conn = MagicMock() # Mock fetchall to return generated conversations mock_data = [ (c["id"], c["content"], c["project"], c["timestamp"], "{}") for c in conversations ] mock_db.conn.execute = MagicMock( return_value=MagicMock(fetchall=MagicMock(return_value=mock_data)) ) optimizer = MemoryOptimizer(mock_db) result = await optimizer.compress_memory(dry_run=True) if result["status"] == "success": # Property: Compression should save space assert result.get("space_saved_estimate", 0) >= 0 assert 0.0 <= result.get("compression_ratio", 1.0) <= 1.0 ``` **Use Case 3: App Monitor** ```python from hypothesis import given, strategies as st # Property: Activity buffer should never exceed max size @given(st.lists(st.text(min_size=1), min_size=1, max_size=2000)) def test_activity_buffer_bounded(file_paths): """Property: Activity buffer should never exceed 1000 events.""" from session_buddy.app_monitor import ActivityEvent, ProjectActivityMonitor monitor = ProjectActivityMonitor() for path in file_paths: event = ActivityEvent( timestamp=datetime.now().isoformat(), event_type="file_change", application="VSCode", details={"file_path": path}, ) monitor.add_activity(event) # Property: Buffer should be trimmed to 500 when it exceeds 1000 assert len(monitor.activity_buffer) <= 1000 # Property: Recent activity filter should respect time boundaries @given(st.integers(min_value=1, max_value=120), st.integers(min_value=1, max_value=10)) def test_recent_activity_time_boundary(minutes_ago, num_events): """Property: Recent activity should only include events within time window.""" from session_buddy.app_monitor import ActivityEvent, ProjectActivityMonitor monitor = ProjectActivityMonitor() now = datetime.now() # Add events at various times for i in range(num_events): time_offset = timedelta(minutes=i * 5) event = ActivityEvent( timestamp=(now - time_offset).isoformat(), event_type="file_change", application="VSCode", details={"file_path": f"/test/file{i}.py"}, ) monitor.add_activity(event) # Get recent activity recent = monitor.get_recent_activity(minutes=minutes_ago) # Property: All returned events should be within time window cutoff_time = now - timedelta(minutes=minutes_ago) for event in recent: event_time = datetime.fromisoformat(event.timestamp) assert event_time >= cutoff_time ``` **Use Case 4: Serverless Mode - Stateful Testing** ```python from hypothesis.stateful import RuleBasedStateMachine, rule, initialize, invariant class ServerlessSessionMachine(RuleBasedStateMachine): """Stateful testing of serverless session management.""" def __init__(self): super().__init__() self.storage = None self.sessions = {} # Track expected state self.session_ids = [] @initialize() async def setup_storage(self): """Initialize storage backend.""" from aiocache import Cache cache = Cache(Cache.MEMORY) self.storage = ACBCacheStorage(cache, namespace="test") @rule( user_id=st.text(min_size=1, max_size=20), project_id=st.text(min_size=1, max_size=20), ) async def create_session(self, user_id, project_id): """Create a new session.""" session_id = f"session-{len(self.sessions)}" session = SessionState( session_id=session_id, user_id=user_id, project_id=project_id, created_at=datetime.now().isoformat(), last_activity=datetime.now().isoformat(), ) result = await self.storage.store_session(session, ttl_seconds=300) if result: self.sessions[session_id] = session self.session_ids.append(session_id) @rule(target=st.data()) async def retrieve_session(self, target): """Retrieve an existing session.""" if not self.session_ids: return session_id = target.draw(st.sampled_from(self.session_ids)) retrieved = await self.storage.retrieve_session(session_id) # Should match expected state if session_id in self.sessions: assert retrieved is not None assert retrieved.session_id == session_id @rule(target=st.data()) async def delete_session(self, target): """Delete a session.""" if not self.session_ids: return session_id = target.draw(st.sampled_from(self.session_ids)) result = await self.storage.delete_session(session_id) if result and session_id in self.sessions: del self.sessions[session_id] self.session_ids.remove(session_id) @invariant() async def session_consistency(self): """Invariant: All tracked sessions should be retrievable.""" for session_id in self.sessions: retrieved = await self.storage.retrieve_session(session_id) assert retrieved is not None assert retrieved.session_id == session_id # Run the state machine TestServerlessSession = ServerlessSessionMachine.TestCase ``` ##### E. Add Parametrized Fixtures **Current Problem:** Fixture duplication for different scenarios ```python # conftest.py additions @pytest.fixture(params=["memory", "redis", "local"]) def cache_backend(request): """Parametrized fixture for different cache backends.""" backend_type = request.param if backend_type == "memory": from aiocache import Cache return Cache(Cache.MEMORY) elif backend_type == "redis": pytest.skip("Redis backend requires running Redis server") elif backend_type == "local": import tempfile tmpdir = tempfile.mkdtemp() return LocalCacheBackend(tmpdir) @pytest.fixture(params=[10, 100, 1000]) def conversation_dataset_size(request): """Parametrized fixture for different dataset sizes.""" return request.param @pytest.fixture def conversation_dataset(conversation_dataset_size): """Generate conversation dataset of specified size.""" return [ { "id": f"conv-{i}", "content": f"Test conversation {i}", "project": "test-project", "timestamp": datetime.now().isoformat(), } for i in range(conversation_dataset_size) ] ``` #### **Priority 3: Nice-to-Have Improvements** ##### F. Add Test Markers and Custom Markers ```text # pytest.ini additions [tool:pytest] markers = slow: marks tests as slow (deselect with '-m "not slow"') integration: marks tests requiring integration unit: marks tests as unit tests property: marks property-based tests with Hypothesis stateful: marks stateful tests with Hypothesis cache_backend: marks tests that require specific cache backend heavy_mock: marks tests with heavy mocking that should be refactored # Usage in tests @pytest.mark.slow @pytest.mark.cache_backend("redis") async def test_redis_storage_integration(...): """Integration test with real Redis backend.""" @pytest.mark.property @given(...) def test_property_based(...): """Property-based test using Hypothesis.""" ``` ##### G. Add Benchmark Tests ```python # test_app_monitor.py additions def test_activity_buffer_performance(benchmark): """Benchmark activity buffer operations.""" from session_buddy.app_monitor import ActivityEvent, ProjectActivityMonitor monitor = ProjectActivityMonitor() def add_1000_events(): for i in range(1000): event = ActivityEvent( timestamp=datetime.now().isoformat(), event_type="file_change", application="VSCode", details={"file_path": f"/test/file{i}.py"}, ) monitor.add_activity(event) benchmark(add_1000_events) # Performance assertion assert benchmark.stats.stats.mean < 0.1 # Should complete in < 100ms ``` ______________________________________________________________________ ### 4. Hypothesis Integration Strategy #### Phase 1: Quick Wins (1-2 hours) **Target:** Add 10 property tests for existing functionality 1. **Data Model Properties** (test_multi_project_coordinator.py) - Property: ProjectGroup projects should be unique - Property: SessionLink source and target should be different - Property: ProjectDependency should not create cycles 1. **Boundary Testing** (test_memory_optimizer.py) - Property: Importance scores should be [0, 1] - Property: Compression should preserve conversation count - Property: Clustering should not lose conversations 1. **Buffer Management** (test_app_monitor.py) - Property: Activity buffer should never exceed max size - Property: Time-based filtering should respect boundaries #### Phase 2: Stateful Testing (3-4 hours) **Target:** Add 3 state machines for complex workflows 1. **ServerlessSessionMachine** (test_serverless_mode.py) - State: Session CRUD operations - Invariant: All created sessions should be retrievable - Invariant: Deleted sessions should not be retrievable 1. **MultiProjectCoordinatorMachine** (test_multi_project_coordinator.py) - State: Project groups, dependencies, and links - Invariant: Dependencies should form a DAG - Invariant: Session links should be bidirectional 1. **MemoryOptimizerMachine** (test_memory_optimizer.py) - State: Compress, restore, update conversations - Invariant: Compression should be idempotent - Invariant: Total conversation count should be consistent #### Phase 3: Custom Strategies (2-3 hours) **Target:** Create reusable strategies for domain objects ```python # tests/strategies.py (new file) from hypothesis import strategies as st from datetime import datetime @st.composite def project_groups(draw): """Generate valid ProjectGroup data.""" name = draw(st.text(min_size=1, max_size=50)) projects = draw( st.lists(st.text(min_size=1, max_size=20), min_size=1, max_size=10, unique=True) ) description = draw(st.text(max_size=200)) return { "name": name, "projects": projects, "description": description, } @st.composite def session_states(draw): """Generate valid SessionState data.""" session_id = draw(st.uuids()).hex user_id = draw(st.text(min_size=1, max_size=50)) project_id = draw(st.text(min_size=1, max_size=50)) created_at = draw( st.datetimes(min_value=datetime(2020, 1, 1), max_value=datetime(2025, 12, 31)) ) return SessionState( session_id=session_id, user_id=user_id, project_id=project_id, created_at=created_at.isoformat(), last_activity=created_at.isoformat(), ) @st.composite def activity_events(draw): """Generate valid ActivityEvent data.""" timestamp = draw(st.datetimes()).isoformat() event_type = draw(st.sampled_from(["file_change", "browser_nav", "app_focus"])) application = draw(st.sampled_from(["VSCode", "Chrome", "Terminal", "Finder"])) details = {} if event_type == "file_change": details["file_path"] = draw(st.text(min_size=1)) elif event_type == "browser_nav": details["url"] = draw(st.from_regex(r"https?://[a-z\.]+/.*", fullmatch=True)) return ActivityEvent( timestamp=timestamp, event_type=event_type, application=application, details=details, ) ``` ______________________________________________________________________ ### 5. Specific Code Examples of Improvements #### Example 1: test_multi_project_coordinator.py **Before (Lines 43-51):** ```python def test_project_dependency_types(self) -> None: """Should validate ProjectDependency types.""" # Valid dependency types for dep_type in ["uses", "extends", "references", "shares_code"]: dep = ProjectDependency( id=f"dep-{dep_type}", source_project="project-a", target_project="project-b", dependency_type=dep_type, description=f"Test {dep_type} dependency", ) assert dep.dependency_type == dep_type ``` **After:** ```python @pytest.mark.parametrize( "dep_type,expected_description", [ pytest.param("uses", "Service depends on another service", id="uses"), pytest.param("extends", "Service extends base functionality", id="extends"), pytest.param("references", "Service references shared code", id="references"), pytest.param("shares_code", "Service shares code library", id="shares-code"), ], ) def test_project_dependency_types(dep_type: str, expected_description: str) -> None: """Should validate ProjectDependency types with semantic context.""" dep = ProjectDependency( id=f"dep-{dep_type}", source_project="project-a", target_project="project-b", dependency_type=dep_type, description=expected_description, ) assert dep.dependency_type == dep_type assert dep.source_project == "project-a" assert dep.target_project == "project-b" ``` **Benefits:** - 4 separate test cases with clear IDs - Each failure is isolated and reportable - Can run specific test case: `pytest -k "test_project_dependency_types[uses]"` - Adds semantic context with descriptions ______________________________________________________________________ #### Example 2: test_memory_optimizer.py **Before (Lines 229-242):** ````python def test_calculate_importance_score_with_code(self) -> None: """Should give higher importance to conversations with code.""" manager = RetentionPolicyManager() conversation = { "content": "Here's the implementation:\n```python\ndef example():\n return True\n```", "timestamp": datetime.now().isoformat(), } score = manager.calculate_importance_score(conversation) assert score > 0.3 # Should get has_code bonus ```` **After (with Hypothesis):** ````python from hypothesis import given, strategies as st, assume @given( code_block=st.text(min_size=10, max_size=500), language=st.sampled_from(["python", "javascript", "sql", "rust", "go"]), ) def test_calculate_importance_score_with_code_property( code_block: str, language: str ) -> None: """Property: Conversations with code blocks should have higher importance scores.""" manager = RetentionPolicyManager() assume(code_block.strip()) # Ensure non-empty code # Conversation with code conv_with_code = { "content": f"Here's the solution:\n```{language}\n{code_block}\n```", "timestamp": datetime.now().isoformat(), } # Conversation without code conv_without_code = { "content": "General discussion about the problem.", "timestamp": datetime.now().isoformat(), } score_with_code = manager.calculate_importance_score(conv_with_code) score_without_code = manager.calculate_importance_score(conv_without_code) # Property: Code conversations should score higher assert score_with_code > score_without_code assert 0.0 <= score_with_code <= 1.0 assert 0.0 <= score_without_code <= 1.0 ```` **Benefits:** - Tests property across many code examples automatically - Finds edge cases (empty code, weird languages) - Validates score boundaries - Tests relative importance (code vs no code) ______________________________________________________________________ #### Example 3: test_app_monitor.py **Before (Lines 99-129):** ```python def test_get_recent_activity(self) -> None: """Should retrieve recent activity within time window.""" monitor = ProjectActivityMonitor() # Add recent event recent_event = ActivityEvent( timestamp=datetime.now().isoformat(), event_type="file_change", application="VSCode", details={"file_path": "/test/recent.py"}, ) monitor.add_activity(recent_event) # Add old event (2 hours ago) old_time = (datetime.now() - timedelta(hours=2)).isoformat() old_event = ActivityEvent( timestamp=old_time, event_type="file_change", application="VSCode", details={"file_path": "/test/old.py"}, ) monitor.add_activity(old_event) # Get recent activity (last 30 minutes) recent = monitor.get_recent_activity(minutes=30) # Should only include recent event assert len(recent) == 1 assert recent[0] == recent_event ``` **After (with parametrization and property testing):** ```python from hypothesis import given, strategies as st @pytest.mark.parametrize( "window_minutes,num_recent,num_old", [ pytest.param(30, 5, 0, id="only-recent"), pytest.param(30, 5, 10, id="mixed"), pytest.param(120, 10, 5, id="large-window"), pytest.param(5, 1, 20, id="small-window"), ], ) def test_get_recent_activity_time_filtering( window_minutes: int, num_recent: int, num_old: int ) -> None: """Should correctly filter activity based on time window.""" monitor = ProjectActivityMonitor() now = datetime.now() # Add recent events (within window) for i in range(num_recent): event = ActivityEvent( timestamp=(now - timedelta(minutes=i)).isoformat(), event_type="file_change", application="VSCode", details={"file_path": f"/test/recent{i}.py"}, ) monitor.add_activity(event) # Add old events (outside window) for i in range(num_old): event = ActivityEvent( timestamp=(now - timedelta(hours=3) - timedelta(minutes=i)).isoformat(), event_type="file_change", application="VSCode", details={"file_path": f"/test/old{i}.py"}, ) monitor.add_activity(event) # Get recent activity recent = monitor.get_recent_activity(minutes=window_minutes) # Assertions assert len(recent) == num_recent # All returned events should be within window cutoff_time = now - timedelta(minutes=window_minutes) for event in recent: event_time = datetime.fromisoformat(event.timestamp) assert event_time >= cutoff_time @given( window_minutes=st.integers(min_value=1, max_value=240), event_times=st.lists( st.integers(min_value=-300, max_value=0), # minutes relative to now min_size=1, max_size=50, ), ) def test_recent_activity_property(window_minutes: int, event_times: list[int]) -> None: """Property: Recent activity should only include events within time window.""" monitor = ProjectActivityMonitor() now = datetime.now() # Add events at various times for i, minutes_ago in enumerate(event_times): event = ActivityEvent( timestamp=(now + timedelta(minutes=minutes_ago)).isoformat(), event_type="file_change", application="VSCode", details={"file_path": f"/test/file{i}.py"}, ) monitor.add_activity(event) # Get recent activity recent = monitor.get_recent_activity(minutes=window_minutes) # Property: All returned events should be within window cutoff_time = now - timedelta(minutes=window_minutes) for event in recent: event_time = datetime.fromisoformat(event.timestamp) assert event_time >= cutoff_time # Property: Should not include events outside window expected_count = sum(1 for t in event_times if t >= -window_minutes) assert len(recent) == expected_count ``` **Benefits:** - Tests multiple scenarios with parametrization - Property testing finds edge cases automatically - Validates time boundary logic thoroughly - Clear test IDs for debugging failures ______________________________________________________________________ #### Example 4: test_serverless_mode.py **Before (Lines 183-203):** ```python @pytest.mark.asyncio async def test_list_sessions_with_filter(self) -> None: """Should filter sessions by user_id and project_id.""" mock_cache = AsyncMock() mock_cache.get = AsyncMock( return_value={ "session-1": {"user_id": "user-1", "project_id": "project-1"}, "session-2": {"user_id": "user-2", "project_id": "project-1"}, "session-3": {"user_id": "user-1", "project_id": "project-2"}, } ) storage = ACBCacheStorage(mock_cache, namespace="test") # Filter by user_id sessions = await storage.list_sessions(user_id="user-1") assert len(sessions) == 2 assert "session-1" in sessions assert "session-3" in sessions ``` **After (with stateful testing):** ```python from hypothesis.stateful import RuleBasedStateMachine, rule, initialize, invariant from hypothesis import strategies as st class SessionStorageMachine(RuleBasedStateMachine): """Stateful testing of session storage operations.""" def __init__(self): super().__init__() self.storage = None self.sessions = {} # Track expected state @initialize() async def setup_storage(self): """Initialize real in-memory storage.""" from aiocache import Cache cache = Cache(Cache.MEMORY) self.storage = ACBCacheStorage(cache, namespace="test") @rule( user_id=st.text(min_size=1, max_size=20), project_id=st.text(min_size=1, max_size=20), ) async def create_session(self, user_id, project_id): """Create a new session.""" session_id = f"session-{len(self.sessions)}" session = SessionState( session_id=session_id, user_id=user_id, project_id=project_id, created_at=datetime.now().isoformat(), last_activity=datetime.now().isoformat(), ) result = await self.storage.store_session(session, ttl_seconds=300) if result: self.sessions[session_id] = session @rule(target=st.data()) async def list_by_user(self, target): """List sessions by user ID.""" if not self.sessions: return # Pick a random user from existing sessions user_ids = {s.user_id for s in self.sessions.values()} user_id = target.draw(st.sampled_from(list(user_ids))) # List sessions for this user session_ids = await self.storage.list_sessions(user_id=user_id) # Verify results match expected state expected_sessions = [ sid for sid, sess in self.sessions.items() if sess.user_id == user_id ] assert len(session_ids) == len(expected_sessions) for sid in session_ids: assert sid in expected_sessions @rule(target=st.data()) async def list_by_project(self, target): """List sessions by project ID.""" if not self.sessions: return # Pick a random project from existing sessions project_ids = {s.project_id for s in self.sessions.values()} project_id = target.draw(st.sampled_from(list(project_ids))) # List sessions for this project session_ids = await self.storage.list_sessions(project_id=project_id) # Verify results match expected state expected_sessions = [ sid for sid, sess in self.sessions.items() if sess.project_id == project_id ] assert len(session_ids) == len(expected_sessions) @invariant() async def all_sessions_retrievable(self): """Invariant: All tracked sessions should be retrievable.""" for session_id, expected_session in self.sessions.items(): retrieved = await self.storage.retrieve_session(session_id) assert retrieved is not None assert retrieved.session_id == session_id assert retrieved.user_id == expected_session.user_id # Run the state machine TestSessionStorage = SessionStorageMachine.TestCase ``` **Benefits:** - Tests real storage behavior, not mocks - Explores state space automatically - Finds race conditions and edge cases - Maintains invariants throughout execution ______________________________________________________________________ ### 6. Overall Testing Maturity Assessment **Current Level:** **Intermediate** (Level 3 of 5) **Progression Path:** ``` Level 1: Basic ──────────────────────────────────────────── Simple assertions, no fixtures, manual setup Level 2: Organized ──────────────────────────────────────── Test classes, basic fixtures, clear names Level 3: Intermediate ●────────────────────────────────── (YOU ARE HERE) Async/await, class organization, some edge cases - Missing: Parametrization, advanced fixtures - Missing: Property-based testing - Issue: Heavy mock over-reliance Level 4: Advanced ───────────────────────────────────────── Parametrization, fixture factories, Hypothesis Property-based testing, stateful testing Minimal mocking, integration focus Level 5: Expert ─────────────────────────────────────────── Custom strategies, generative testing Mutation testing, fuzzing integration Comprehensive property testing suite ``` **To Reach Level 4 (Recommended):** 1. ✅ **Implement parametrization** (Priority 1A) - 2-3 hours 1. ✅ **Create fixture factories** (Priority 1B) - 2-3 hours 1. ✅ **Add 10+ property tests** (Priority 2D Phase 1) - 1-2 hours 1. ✅ **Reduce mocking by 50%** (Priority 1C) - 3-4 hours 1. ✅ **Add 2-3 state machines** (Priority 2D Phase 2) - 3-4 hours **Total Time Investment:** 11-16 hours **Expected Impact:** 2x test confidence, 40% code reduction, 10x edge case coverage ______________________________________________________________________ ## Summary Recommendations ### Immediate Actions (This Week) 1. **Add parametrization to loop-based tests** (4 files, ~15 tests) - Estimated time: 2 hours - Impact: 30% code reduction, better test reporting 1. **Create mock fixture factories** (conftest.py) - Estimated time: 1 hour - Impact: Eliminate 60% of duplicate setup code 1. **Add 5 property tests for data models** - Estimated time: 1 hour - Impact: 100x more edge cases tested ### Short-Term Goals (Next 2 Weeks) 4. **Reduce mock over-reliance** (test_serverless_mode.py, test_memory_optimizer.py) - Estimated time: 4 hours - Impact: 2x confidence in integration behavior 1. **Add stateful testing** (1-2 state machines) - Estimated time: 3 hours - Impact: Find complex interaction bugs 1. **Create custom Hypothesis strategies** - Estimated time: 2 hours - Impact: Reusable test data generation ### Long-Term Excellence (1 Month) 7. **Comprehensive property test suite** (20+ properties) 1. **Complete fixture refactoring** (eliminate all duplicate setup) 1. **Benchmark tests for performance baselines** 1. **Integration tests with real backends** (reduce mocks to \<20%) ______________________________________________________________________ ## Conclusion The Week 5 testing implementation demonstrates **solid foundational practices** with excellent async/await handling and clear organization. However, significant opportunities exist to elevate test quality through: 1. **Parametrization** - Eliminate loops and duplication 1. **Hypothesis Integration** - Property-based and stateful testing 1. **Reduced Mocking** - More integration, less isolation 1. **Advanced Fixtures** - Factories and parametrized fixtures **Recommended Priority:** Implement Priority 1A-C (parametrization, fixtures, reduce mocks) within 1 week for maximum impact with minimal time investment. **Score Summary:** - Pytest Patterns: **7/10** (Good foundations, missing advanced features) - Test Quality: **6.5/10** (Adequate coverage, weak integration) - Overall: **7.5/10** (Good with clear improvement path to 9/10) The test suite is **production-ready** but would benefit significantly from the improvements outlined above to reach **industry-leading quality** (9/10+).

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lesleslie/session-buddy'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

WEEK5_TESTING_REVIEW.md•40.1 KiB