# Week 5 Testing Implementation Review
**Pytest Best Practices Assessment**
**Review Date:** 2025-10-29
**Modules Reviewed:** 79 tests across 4 modules
**Test Status:** ✅ All 79 tests passing
______________________________________________________________________
## Executive Summary
**Overall Score: 7.5/10** (Good with significant improvement opportunities)
Week 5 testing demonstrates solid foundational patterns with consistent test organization and proper async/await handling. However, there are substantial opportunities to elevate test quality through advanced pytest features, particularly parametrization, property-based testing with Hypothesis, and more sophisticated fixture usage.
______________________________________________________________________
## Detailed Assessment
### 1. Pytest Patterns Score: 7/10
**Strengths:**
- ✅ **Excellent test organization** with descriptive class grouping
- ✅ **Proper async/await patterns** - all `@pytest.mark.asyncio` decorators applied correctly
- ✅ **Clear test names** following "should_describe_behavior" convention
- ✅ **Consistent structure** across all four test modules
**Weaknesses:**
- ❌ **Zero parametrization** - significant duplication in loop-based tests
- ❌ **Limited fixture reuse** - heavy mock setup duplication
- ❌ **No fixture factories** - missing powerful pattern for test data generation
- ❌ **Weak assertion patterns** - many assertions test implementation details
**Examples:**
**Good Pattern (Clear Organization):**
```python
class TestConversationSummarizer:
"""Test conversation summarization strategies."""
def test_extractive_summarization(self) -> None:
"""Should extract important sentences from conversation."""
```
**Anti-Pattern (Missing Parametrization):**
```python
# ❌ Current approach - loop in test
def test_project_dependency_types(self) -> None:
"""Should validate ProjectDependency types."""
for dep_type in ["uses", "extends", "references", "shares_code"]:
dep = ProjectDependency(
id=f"dep-{dep_type}",
source_project="project-a",
target_project="project-b",
dependency_type=dep_type,
description=f"Test {dep_type} dependency",
)
assert dep.dependency_type == dep_type
# ✅ Better approach - pytest parametrization
@pytest.mark.parametrize("dep_type", ["uses", "extends", "references", "shares_code"])
def test_project_dependency_types(dep_type: str) -> None:
"""Should validate ProjectDependency types."""
dep = ProjectDependency(
id=f"dep-{dep_type}",
source_project="project-a",
target_project="project-b",
dependency_type=dep_type,
description=f"Test {dep_type} dependency",
)
assert dep.dependency_type == dep_type
```
______________________________________________________________________
### 2. Test Quality Score: 6.5/10
**Strengths:**
- ✅ **Good coverage breadth** - 79 tests covering core functionality
- ✅ **Edge case awareness** - tests for empty data, missing fields
- ✅ **Error path testing** - validates error handling scenarios
**Weaknesses:**
- ❌ **Mock over-reliance** - 95%+ tests use heavy mocking
- ❌ **Weak assertions** - testing implementation details instead of behavior
- ❌ **Missing integration depth** - tests isolated units but not interactions
- ❌ **Limited boundary testing** - few tests explore data boundary conditions
**Examples:**
**Anti-Pattern (Testing Implementation Details):**
```python
# ❌ Brittle - tests database call count instead of behavior
assert mock_db.conn.execute.call_count >= 1
# ✅ Better - test actual outcome
groups = await coordinator.get_project_groups()
assert len(groups) == 1
assert groups[0].name == "Test Group"
```
**Anti-Pattern (Over-Mocking):**
```python
# ❌ Current - mocks hide actual behavior
mock_db = MagicMock()
mock_db.conn = MagicMock()
mock_db.conn.execute = MagicMock(
return_value=MagicMock(fetchall=MagicMock(return_value=[]))
)
# ✅ Better - use real database with temp fixture
async def test_get_project_groups_empty(reflection_db: ReflectionDatabase):
"""Should return empty list when no groups exist."""
coordinator = MultiProjectCoordinator(reflection_db)
groups = await coordinator.get_project_groups()
assert groups == []
```
______________________________________________________________________
### 3. Improvement Opportunities (Prioritized)
#### **Priority 1: Critical Improvements**
##### A. Add Parametrization (Estimated Impact: 40% code reduction)
**Current Problem:** Test loops and duplication
```python
# ❌ test_memory_optimizer.py line 127-131
for strategy in ["extractive", "template_based", "keyword_based"]:
summary = summarizer.summarize_conversation(content, strategy)
assert isinstance(summary, str)
assert len(summary) > 0
```
**Solution:**
```python
@pytest.mark.parametrize(
"strategy",
[
pytest.param("extractive", id="extractive-strategy"),
pytest.param("template_based", id="template-strategy"),
pytest.param("keyword_based", id="keyword-strategy"),
],
)
def test_summarize_conversation_with_strategy(strategy: str) -> None:
"""Should use specified summarization strategy."""
summarizer = ConversationSummarizer()
content = "Test conversation with function implementation and error handling."
summary = summarizer.summarize_conversation(content, strategy)
assert isinstance(summary, str)
assert len(summary) > 0
```
**Files to Update:**
- `test_multi_project_coordinator.py`: Lines 43-51, 58-66 (dependency types, link types)
- `test_memory_optimizer.py`: Lines 127-131 (summarization strategies)
- `test_app_monitor.py`: Lines 138-145 (multiple file events)
- `test_serverless_mode.py`: Lines 198-202 (session filtering)
##### B. Create Fixture Factories (Estimated Impact: 60% setup reduction)
**Current Problem:** Repeated mock setup
```python
# ❌ Repeated in every test
mock_db = MagicMock()
mock_db.conn = MagicMock()
mock_db.conn.execute = MagicMock()
coordinator = MultiProjectCoordinator(mock_db)
```
**Solution:**
```python
# conftest.py
@pytest.fixture
def mock_coordinator_db():
"""Factory for coordinator database with configurable responses."""
def _create(fetchall_return=None, fetchone_return=None):
mock_db = MagicMock()
mock_db.conn = MagicMock()
if fetchall_return is not None:
mock_db.conn.execute = MagicMock(
return_value=MagicMock(fetchall=MagicMock(return_value=fetchall_return))
)
elif fetchone_return is not None:
mock_db.conn.execute = MagicMock(
return_value=MagicMock(fetchone=MagicMock(return_value=fetchone_return))
)
else:
mock_db.conn.execute = MagicMock()
return mock_db
return _create
# Usage in test
def test_get_project_groups_empty(mock_coordinator_db) -> None:
"""Should return empty list when no groups exist."""
db = mock_coordinator_db(fetchall_return=[])
coordinator = MultiProjectCoordinator(db)
groups = await coordinator.get_project_groups()
assert groups == []
```
##### C. Reduce Mock Over-Reliance (Estimated Impact: 2x confidence increase)
**Current Problem:** Heavy mocking hides integration bugs
```python
# ❌ test_serverless_mode.py - entirely mocked
mock_cache = AsyncMock()
mock_cache.set = AsyncMock()
mock_cache.get = AsyncMock(return_value=None)
storage = ACBCacheStorage(mock_cache, namespace="test")
```
**Solution:**
```python
# ✅ Use real ACBCacheStorage with in-memory cache
@pytest.fixture
async def memory_cache_storage():
"""Provide real ACBCacheStorage with in-memory backend."""
from aiocache import Cache
cache = Cache(Cache.MEMORY)
storage = ACBCacheStorage(cache, namespace="test")
yield storage
await cache.clear()
async def test_store_and_retrieve_session(memory_cache_storage):
"""Should store and retrieve session through real cache."""
session = SessionState(
session_id="test-123",
user_id="user-1",
project_id="project-1",
created_at="2025-01-01T12:00:00",
last_activity="2025-01-01T12:00:00",
)
# Test real storage behavior
stored = await memory_cache_storage.store_session(session, ttl_seconds=60)
assert stored is True
# Test real retrieval behavior
retrieved = await memory_cache_storage.retrieve_session("test-123")
assert retrieved is not None
assert retrieved.session_id == "test-123"
```
#### **Priority 2: High-Value Additions**
##### D. Add Hypothesis Property-Based Testing
**Use Case 1: Multi-Project Coordinator**
```python
from hypothesis import given, strategies as st
from hypothesis.stateful import RuleBasedStateMachine, rule, initialize, invariant
# Property: Project groups should always maintain unique project IDs
@given(
st.lists(
st.tuples(st.text(min_size=1), st.lists(st.text(min_size=1), min_size=1)),
min_size=1,
max_size=10,
)
)
async def test_project_groups_maintain_uniqueness(group_data):
"""Property: All projects in a group should be unique."""
mock_db = MagicMock()
mock_db.conn = MagicMock()
mock_db.conn.execute = MagicMock()
coordinator = MultiProjectCoordinator(mock_db)
for name, projects in group_data:
group = await coordinator.create_project_group(
name=name, projects=projects, description="Test"
)
# Property: No duplicate projects in group
assert len(group.projects) == len(set(group.projects))
# Property: Session links should be bidirectional
@given(
st.text(min_size=1, max_size=50),
st.text(min_size=1, max_size=50),
st.sampled_from(["related", "continuation", "reference", "dependency"]),
)
async def test_session_links_are_queryable(session_a, session_b, link_type):
"""Property: Links created between sessions should be queryable from both directions."""
mock_db = MagicMock()
mock_db.conn = MagicMock()
mock_db.conn.execute = MagicMock()
coordinator = MultiProjectCoordinator(mock_db)
# Create link
link = await coordinator.link_sessions(
source_session_id=session_a,
target_session_id=session_b,
link_type=link_type,
context="Test link",
)
# Property: Link should have both source and target
assert link.source_session_id == session_a
assert link.target_session_id == session_b
```
**Use Case 2: Memory Optimizer**
```python
from hypothesis import given, strategies as st, assume
# Custom strategy for conversation data
@st.composite
def conversation_data(draw):
"""Generate realistic conversation data."""
content = draw(st.text(min_size=10, max_size=1000))
timestamp = draw(
st.datetimes(min_value=datetime(2020, 1, 1), max_value=datetime(2025, 12, 31))
)
project = draw(st.text(min_size=1, max_size=50))
assume(content.strip()) # Ensure non-empty content
return {
"id": str(uuid.uuid4()),
"content": content,
"project": project,
"timestamp": timestamp.isoformat(),
"metadata": {},
}
# Property: Importance score should always be between 0 and 1
@given(conversation_data())
def test_importance_score_bounds(conversation):
"""Property: Importance score should always be normalized between 0 and 1."""
manager = RetentionPolicyManager()
score = manager.calculate_importance_score(conversation)
assert 0.0 <= score <= 1.0
# Property: Clustering should preserve conversation count
@given(st.lists(conversation_data(), min_size=2, max_size=20))
def test_clustering_preserves_conversations(conversations):
"""Property: Clustering should not lose any conversations."""
clusterer = ConversationClusterer()
clusters = clusterer.cluster_conversations(conversations)
# Count total conversations in clusters
total_clustered = sum(len(cluster) for cluster in clusters)
# Property: All conversations should be in exactly one cluster
assert total_clustered == len(conversations)
# Property: Compression should reduce size
@given(st.lists(conversation_data(), min_size=10, max_size=100))
async def test_compression_reduces_size(conversations):
"""Property: Memory compression should reduce total storage size."""
mock_db = MagicMock()
mock_db.conn = MagicMock()
# Mock fetchall to return generated conversations
mock_data = [
(c["id"], c["content"], c["project"], c["timestamp"], "{}")
for c in conversations
]
mock_db.conn.execute = MagicMock(
return_value=MagicMock(fetchall=MagicMock(return_value=mock_data))
)
optimizer = MemoryOptimizer(mock_db)
result = await optimizer.compress_memory(dry_run=True)
if result["status"] == "success":
# Property: Compression should save space
assert result.get("space_saved_estimate", 0) >= 0
assert 0.0 <= result.get("compression_ratio", 1.0) <= 1.0
```
**Use Case 3: App Monitor**
```python
from hypothesis import given, strategies as st
# Property: Activity buffer should never exceed max size
@given(st.lists(st.text(min_size=1), min_size=1, max_size=2000))
def test_activity_buffer_bounded(file_paths):
"""Property: Activity buffer should never exceed 1000 events."""
from session_buddy.app_monitor import ActivityEvent, ProjectActivityMonitor
monitor = ProjectActivityMonitor()
for path in file_paths:
event = ActivityEvent(
timestamp=datetime.now().isoformat(),
event_type="file_change",
application="VSCode",
details={"file_path": path},
)
monitor.add_activity(event)
# Property: Buffer should be trimmed to 500 when it exceeds 1000
assert len(monitor.activity_buffer) <= 1000
# Property: Recent activity filter should respect time boundaries
@given(st.integers(min_value=1, max_value=120), st.integers(min_value=1, max_value=10))
def test_recent_activity_time_boundary(minutes_ago, num_events):
"""Property: Recent activity should only include events within time window."""
from session_buddy.app_monitor import ActivityEvent, ProjectActivityMonitor
monitor = ProjectActivityMonitor()
now = datetime.now()
# Add events at various times
for i in range(num_events):
time_offset = timedelta(minutes=i * 5)
event = ActivityEvent(
timestamp=(now - time_offset).isoformat(),
event_type="file_change",
application="VSCode",
details={"file_path": f"/test/file{i}.py"},
)
monitor.add_activity(event)
# Get recent activity
recent = monitor.get_recent_activity(minutes=minutes_ago)
# Property: All returned events should be within time window
cutoff_time = now - timedelta(minutes=minutes_ago)
for event in recent:
event_time = datetime.fromisoformat(event.timestamp)
assert event_time >= cutoff_time
```
**Use Case 4: Serverless Mode - Stateful Testing**
```python
from hypothesis.stateful import RuleBasedStateMachine, rule, initialize, invariant
class ServerlessSessionMachine(RuleBasedStateMachine):
"""Stateful testing of serverless session management."""
def __init__(self):
super().__init__()
self.storage = None
self.sessions = {} # Track expected state
self.session_ids = []
@initialize()
async def setup_storage(self):
"""Initialize storage backend."""
from aiocache import Cache
cache = Cache(Cache.MEMORY)
self.storage = ACBCacheStorage(cache, namespace="test")
@rule(
user_id=st.text(min_size=1, max_size=20),
project_id=st.text(min_size=1, max_size=20),
)
async def create_session(self, user_id, project_id):
"""Create a new session."""
session_id = f"session-{len(self.sessions)}"
session = SessionState(
session_id=session_id,
user_id=user_id,
project_id=project_id,
created_at=datetime.now().isoformat(),
last_activity=datetime.now().isoformat(),
)
result = await self.storage.store_session(session, ttl_seconds=300)
if result:
self.sessions[session_id] = session
self.session_ids.append(session_id)
@rule(target=st.data())
async def retrieve_session(self, target):
"""Retrieve an existing session."""
if not self.session_ids:
return
session_id = target.draw(st.sampled_from(self.session_ids))
retrieved = await self.storage.retrieve_session(session_id)
# Should match expected state
if session_id in self.sessions:
assert retrieved is not None
assert retrieved.session_id == session_id
@rule(target=st.data())
async def delete_session(self, target):
"""Delete a session."""
if not self.session_ids:
return
session_id = target.draw(st.sampled_from(self.session_ids))
result = await self.storage.delete_session(session_id)
if result and session_id in self.sessions:
del self.sessions[session_id]
self.session_ids.remove(session_id)
@invariant()
async def session_consistency(self):
"""Invariant: All tracked sessions should be retrievable."""
for session_id in self.sessions:
retrieved = await self.storage.retrieve_session(session_id)
assert retrieved is not None
assert retrieved.session_id == session_id
# Run the state machine
TestServerlessSession = ServerlessSessionMachine.TestCase
```
##### E. Add Parametrized Fixtures
**Current Problem:** Fixture duplication for different scenarios
```python
# conftest.py additions
@pytest.fixture(params=["memory", "redis", "local"])
def cache_backend(request):
"""Parametrized fixture for different cache backends."""
backend_type = request.param
if backend_type == "memory":
from aiocache import Cache
return Cache(Cache.MEMORY)
elif backend_type == "redis":
pytest.skip("Redis backend requires running Redis server")
elif backend_type == "local":
import tempfile
tmpdir = tempfile.mkdtemp()
return LocalCacheBackend(tmpdir)
@pytest.fixture(params=[10, 100, 1000])
def conversation_dataset_size(request):
"""Parametrized fixture for different dataset sizes."""
return request.param
@pytest.fixture
def conversation_dataset(conversation_dataset_size):
"""Generate conversation dataset of specified size."""
return [
{
"id": f"conv-{i}",
"content": f"Test conversation {i}",
"project": "test-project",
"timestamp": datetime.now().isoformat(),
}
for i in range(conversation_dataset_size)
]
```
#### **Priority 3: Nice-to-Have Improvements**
##### F. Add Test Markers and Custom Markers
```text
# pytest.ini additions
[tool:pytest]
markers =
slow: marks tests as slow (deselect with '-m "not slow"')
integration: marks tests requiring integration
unit: marks tests as unit tests
property: marks property-based tests with Hypothesis
stateful: marks stateful tests with Hypothesis
cache_backend: marks tests that require specific cache backend
heavy_mock: marks tests with heavy mocking that should be refactored
# Usage in tests
@pytest.mark.slow
@pytest.mark.cache_backend("redis")
async def test_redis_storage_integration(...):
"""Integration test with real Redis backend."""
@pytest.mark.property
@given(...)
def test_property_based(...):
"""Property-based test using Hypothesis."""
```
##### G. Add Benchmark Tests
```python
# test_app_monitor.py additions
def test_activity_buffer_performance(benchmark):
"""Benchmark activity buffer operations."""
from session_buddy.app_monitor import ActivityEvent, ProjectActivityMonitor
monitor = ProjectActivityMonitor()
def add_1000_events():
for i in range(1000):
event = ActivityEvent(
timestamp=datetime.now().isoformat(),
event_type="file_change",
application="VSCode",
details={"file_path": f"/test/file{i}.py"},
)
monitor.add_activity(event)
benchmark(add_1000_events)
# Performance assertion
assert benchmark.stats.stats.mean < 0.1 # Should complete in < 100ms
```
______________________________________________________________________
### 4. Hypothesis Integration Strategy
#### Phase 1: Quick Wins (1-2 hours)
**Target:** Add 10 property tests for existing functionality
1. **Data Model Properties** (test_multi_project_coordinator.py)
- Property: ProjectGroup projects should be unique
- Property: SessionLink source and target should be different
- Property: ProjectDependency should not create cycles
1. **Boundary Testing** (test_memory_optimizer.py)
- Property: Importance scores should be [0, 1]
- Property: Compression should preserve conversation count
- Property: Clustering should not lose conversations
1. **Buffer Management** (test_app_monitor.py)
- Property: Activity buffer should never exceed max size
- Property: Time-based filtering should respect boundaries
#### Phase 2: Stateful Testing (3-4 hours)
**Target:** Add 3 state machines for complex workflows
1. **ServerlessSessionMachine** (test_serverless_mode.py)
- State: Session CRUD operations
- Invariant: All created sessions should be retrievable
- Invariant: Deleted sessions should not be retrievable
1. **MultiProjectCoordinatorMachine** (test_multi_project_coordinator.py)
- State: Project groups, dependencies, and links
- Invariant: Dependencies should form a DAG
- Invariant: Session links should be bidirectional
1. **MemoryOptimizerMachine** (test_memory_optimizer.py)
- State: Compress, restore, update conversations
- Invariant: Compression should be idempotent
- Invariant: Total conversation count should be consistent
#### Phase 3: Custom Strategies (2-3 hours)
**Target:** Create reusable strategies for domain objects
```python
# tests/strategies.py (new file)
from hypothesis import strategies as st
from datetime import datetime
@st.composite
def project_groups(draw):
"""Generate valid ProjectGroup data."""
name = draw(st.text(min_size=1, max_size=50))
projects = draw(
st.lists(st.text(min_size=1, max_size=20), min_size=1, max_size=10, unique=True)
)
description = draw(st.text(max_size=200))
return {
"name": name,
"projects": projects,
"description": description,
}
@st.composite
def session_states(draw):
"""Generate valid SessionState data."""
session_id = draw(st.uuids()).hex
user_id = draw(st.text(min_size=1, max_size=50))
project_id = draw(st.text(min_size=1, max_size=50))
created_at = draw(
st.datetimes(min_value=datetime(2020, 1, 1), max_value=datetime(2025, 12, 31))
)
return SessionState(
session_id=session_id,
user_id=user_id,
project_id=project_id,
created_at=created_at.isoformat(),
last_activity=created_at.isoformat(),
)
@st.composite
def activity_events(draw):
"""Generate valid ActivityEvent data."""
timestamp = draw(st.datetimes()).isoformat()
event_type = draw(st.sampled_from(["file_change", "browser_nav", "app_focus"]))
application = draw(st.sampled_from(["VSCode", "Chrome", "Terminal", "Finder"]))
details = {}
if event_type == "file_change":
details["file_path"] = draw(st.text(min_size=1))
elif event_type == "browser_nav":
details["url"] = draw(st.from_regex(r"https?://[a-z\.]+/.*", fullmatch=True))
return ActivityEvent(
timestamp=timestamp,
event_type=event_type,
application=application,
details=details,
)
```
______________________________________________________________________
### 5. Specific Code Examples of Improvements
#### Example 1: test_multi_project_coordinator.py
**Before (Lines 43-51):**
```python
def test_project_dependency_types(self) -> None:
"""Should validate ProjectDependency types."""
# Valid dependency types
for dep_type in ["uses", "extends", "references", "shares_code"]:
dep = ProjectDependency(
id=f"dep-{dep_type}",
source_project="project-a",
target_project="project-b",
dependency_type=dep_type,
description=f"Test {dep_type} dependency",
)
assert dep.dependency_type == dep_type
```
**After:**
```python
@pytest.mark.parametrize(
"dep_type,expected_description",
[
pytest.param("uses", "Service depends on another service", id="uses"),
pytest.param("extends", "Service extends base functionality", id="extends"),
pytest.param("references", "Service references shared code", id="references"),
pytest.param("shares_code", "Service shares code library", id="shares-code"),
],
)
def test_project_dependency_types(dep_type: str, expected_description: str) -> None:
"""Should validate ProjectDependency types with semantic context."""
dep = ProjectDependency(
id=f"dep-{dep_type}",
source_project="project-a",
target_project="project-b",
dependency_type=dep_type,
description=expected_description,
)
assert dep.dependency_type == dep_type
assert dep.source_project == "project-a"
assert dep.target_project == "project-b"
```
**Benefits:**
- 4 separate test cases with clear IDs
- Each failure is isolated and reportable
- Can run specific test case: `pytest -k "test_project_dependency_types[uses]"`
- Adds semantic context with descriptions
______________________________________________________________________
#### Example 2: test_memory_optimizer.py
**Before (Lines 229-242):**
````python
def test_calculate_importance_score_with_code(self) -> None:
"""Should give higher importance to conversations with code."""
manager = RetentionPolicyManager()
conversation = {
"content": "Here's the implementation:\n```python\ndef example():\n return True\n```",
"timestamp": datetime.now().isoformat(),
}
score = manager.calculate_importance_score(conversation)
assert score > 0.3 # Should get has_code bonus
````
**After (with Hypothesis):**
````python
from hypothesis import given, strategies as st, assume
@given(
code_block=st.text(min_size=10, max_size=500),
language=st.sampled_from(["python", "javascript", "sql", "rust", "go"]),
)
def test_calculate_importance_score_with_code_property(
code_block: str, language: str
) -> None:
"""Property: Conversations with code blocks should have higher importance scores."""
manager = RetentionPolicyManager()
assume(code_block.strip()) # Ensure non-empty code
# Conversation with code
conv_with_code = {
"content": f"Here's the solution:\n```{language}\n{code_block}\n```",
"timestamp": datetime.now().isoformat(),
}
# Conversation without code
conv_without_code = {
"content": "General discussion about the problem.",
"timestamp": datetime.now().isoformat(),
}
score_with_code = manager.calculate_importance_score(conv_with_code)
score_without_code = manager.calculate_importance_score(conv_without_code)
# Property: Code conversations should score higher
assert score_with_code > score_without_code
assert 0.0 <= score_with_code <= 1.0
assert 0.0 <= score_without_code <= 1.0
````
**Benefits:**
- Tests property across many code examples automatically
- Finds edge cases (empty code, weird languages)
- Validates score boundaries
- Tests relative importance (code vs no code)
______________________________________________________________________
#### Example 3: test_app_monitor.py
**Before (Lines 99-129):**
```python
def test_get_recent_activity(self) -> None:
"""Should retrieve recent activity within time window."""
monitor = ProjectActivityMonitor()
# Add recent event
recent_event = ActivityEvent(
timestamp=datetime.now().isoformat(),
event_type="file_change",
application="VSCode",
details={"file_path": "/test/recent.py"},
)
monitor.add_activity(recent_event)
# Add old event (2 hours ago)
old_time = (datetime.now() - timedelta(hours=2)).isoformat()
old_event = ActivityEvent(
timestamp=old_time,
event_type="file_change",
application="VSCode",
details={"file_path": "/test/old.py"},
)
monitor.add_activity(old_event)
# Get recent activity (last 30 minutes)
recent = monitor.get_recent_activity(minutes=30)
# Should only include recent event
assert len(recent) == 1
assert recent[0] == recent_event
```
**After (with parametrization and property testing):**
```python
from hypothesis import given, strategies as st
@pytest.mark.parametrize(
"window_minutes,num_recent,num_old",
[
pytest.param(30, 5, 0, id="only-recent"),
pytest.param(30, 5, 10, id="mixed"),
pytest.param(120, 10, 5, id="large-window"),
pytest.param(5, 1, 20, id="small-window"),
],
)
def test_get_recent_activity_time_filtering(
window_minutes: int, num_recent: int, num_old: int
) -> None:
"""Should correctly filter activity based on time window."""
monitor = ProjectActivityMonitor()
now = datetime.now()
# Add recent events (within window)
for i in range(num_recent):
event = ActivityEvent(
timestamp=(now - timedelta(minutes=i)).isoformat(),
event_type="file_change",
application="VSCode",
details={"file_path": f"/test/recent{i}.py"},
)
monitor.add_activity(event)
# Add old events (outside window)
for i in range(num_old):
event = ActivityEvent(
timestamp=(now - timedelta(hours=3) - timedelta(minutes=i)).isoformat(),
event_type="file_change",
application="VSCode",
details={"file_path": f"/test/old{i}.py"},
)
monitor.add_activity(event)
# Get recent activity
recent = monitor.get_recent_activity(minutes=window_minutes)
# Assertions
assert len(recent) == num_recent
# All returned events should be within window
cutoff_time = now - timedelta(minutes=window_minutes)
for event in recent:
event_time = datetime.fromisoformat(event.timestamp)
assert event_time >= cutoff_time
@given(
window_minutes=st.integers(min_value=1, max_value=240),
event_times=st.lists(
st.integers(min_value=-300, max_value=0), # minutes relative to now
min_size=1,
max_size=50,
),
)
def test_recent_activity_property(window_minutes: int, event_times: list[int]) -> None:
"""Property: Recent activity should only include events within time window."""
monitor = ProjectActivityMonitor()
now = datetime.now()
# Add events at various times
for i, minutes_ago in enumerate(event_times):
event = ActivityEvent(
timestamp=(now + timedelta(minutes=minutes_ago)).isoformat(),
event_type="file_change",
application="VSCode",
details={"file_path": f"/test/file{i}.py"},
)
monitor.add_activity(event)
# Get recent activity
recent = monitor.get_recent_activity(minutes=window_minutes)
# Property: All returned events should be within window
cutoff_time = now - timedelta(minutes=window_minutes)
for event in recent:
event_time = datetime.fromisoformat(event.timestamp)
assert event_time >= cutoff_time
# Property: Should not include events outside window
expected_count = sum(1 for t in event_times if t >= -window_minutes)
assert len(recent) == expected_count
```
**Benefits:**
- Tests multiple scenarios with parametrization
- Property testing finds edge cases automatically
- Validates time boundary logic thoroughly
- Clear test IDs for debugging failures
______________________________________________________________________
#### Example 4: test_serverless_mode.py
**Before (Lines 183-203):**
```python
@pytest.mark.asyncio
async def test_list_sessions_with_filter(self) -> None:
"""Should filter sessions by user_id and project_id."""
mock_cache = AsyncMock()
mock_cache.get = AsyncMock(
return_value={
"session-1": {"user_id": "user-1", "project_id": "project-1"},
"session-2": {"user_id": "user-2", "project_id": "project-1"},
"session-3": {"user_id": "user-1", "project_id": "project-2"},
}
)
storage = ACBCacheStorage(mock_cache, namespace="test")
# Filter by user_id
sessions = await storage.list_sessions(user_id="user-1")
assert len(sessions) == 2
assert "session-1" in sessions
assert "session-3" in sessions
```
**After (with stateful testing):**
```python
from hypothesis.stateful import RuleBasedStateMachine, rule, initialize, invariant
from hypothesis import strategies as st
class SessionStorageMachine(RuleBasedStateMachine):
"""Stateful testing of session storage operations."""
def __init__(self):
super().__init__()
self.storage = None
self.sessions = {} # Track expected state
@initialize()
async def setup_storage(self):
"""Initialize real in-memory storage."""
from aiocache import Cache
cache = Cache(Cache.MEMORY)
self.storage = ACBCacheStorage(cache, namespace="test")
@rule(
user_id=st.text(min_size=1, max_size=20),
project_id=st.text(min_size=1, max_size=20),
)
async def create_session(self, user_id, project_id):
"""Create a new session."""
session_id = f"session-{len(self.sessions)}"
session = SessionState(
session_id=session_id,
user_id=user_id,
project_id=project_id,
created_at=datetime.now().isoformat(),
last_activity=datetime.now().isoformat(),
)
result = await self.storage.store_session(session, ttl_seconds=300)
if result:
self.sessions[session_id] = session
@rule(target=st.data())
async def list_by_user(self, target):
"""List sessions by user ID."""
if not self.sessions:
return
# Pick a random user from existing sessions
user_ids = {s.user_id for s in self.sessions.values()}
user_id = target.draw(st.sampled_from(list(user_ids)))
# List sessions for this user
session_ids = await self.storage.list_sessions(user_id=user_id)
# Verify results match expected state
expected_sessions = [
sid for sid, sess in self.sessions.items() if sess.user_id == user_id
]
assert len(session_ids) == len(expected_sessions)
for sid in session_ids:
assert sid in expected_sessions
@rule(target=st.data())
async def list_by_project(self, target):
"""List sessions by project ID."""
if not self.sessions:
return
# Pick a random project from existing sessions
project_ids = {s.project_id for s in self.sessions.values()}
project_id = target.draw(st.sampled_from(list(project_ids)))
# List sessions for this project
session_ids = await self.storage.list_sessions(project_id=project_id)
# Verify results match expected state
expected_sessions = [
sid for sid, sess in self.sessions.items() if sess.project_id == project_id
]
assert len(session_ids) == len(expected_sessions)
@invariant()
async def all_sessions_retrievable(self):
"""Invariant: All tracked sessions should be retrievable."""
for session_id, expected_session in self.sessions.items():
retrieved = await self.storage.retrieve_session(session_id)
assert retrieved is not None
assert retrieved.session_id == session_id
assert retrieved.user_id == expected_session.user_id
# Run the state machine
TestSessionStorage = SessionStorageMachine.TestCase
```
**Benefits:**
- Tests real storage behavior, not mocks
- Explores state space automatically
- Finds race conditions and edge cases
- Maintains invariants throughout execution
______________________________________________________________________
### 6. Overall Testing Maturity Assessment
**Current Level:** **Intermediate** (Level 3 of 5)
**Progression Path:**
```
Level 1: Basic ────────────────────────────────────────────
Simple assertions, no fixtures, manual setup
Level 2: Organized ────────────────────────────────────────
Test classes, basic fixtures, clear names
Level 3: Intermediate ●────────────────────────────────── (YOU ARE HERE)
Async/await, class organization, some edge cases
- Missing: Parametrization, advanced fixtures
- Missing: Property-based testing
- Issue: Heavy mock over-reliance
Level 4: Advanced ─────────────────────────────────────────
Parametrization, fixture factories, Hypothesis
Property-based testing, stateful testing
Minimal mocking, integration focus
Level 5: Expert ───────────────────────────────────────────
Custom strategies, generative testing
Mutation testing, fuzzing integration
Comprehensive property testing suite
```
**To Reach Level 4 (Recommended):**
1. ✅ **Implement parametrization** (Priority 1A) - 2-3 hours
1. ✅ **Create fixture factories** (Priority 1B) - 2-3 hours
1. ✅ **Add 10+ property tests** (Priority 2D Phase 1) - 1-2 hours
1. ✅ **Reduce mocking by 50%** (Priority 1C) - 3-4 hours
1. ✅ **Add 2-3 state machines** (Priority 2D Phase 2) - 3-4 hours
**Total Time Investment:** 11-16 hours
**Expected Impact:** 2x test confidence, 40% code reduction, 10x edge case coverage
______________________________________________________________________
## Summary Recommendations
### Immediate Actions (This Week)
1. **Add parametrization to loop-based tests** (4 files, ~15 tests)
- Estimated time: 2 hours
- Impact: 30% code reduction, better test reporting
1. **Create mock fixture factories** (conftest.py)
- Estimated time: 1 hour
- Impact: Eliminate 60% of duplicate setup code
1. **Add 5 property tests for data models**
- Estimated time: 1 hour
- Impact: 100x more edge cases tested
### Short-Term Goals (Next 2 Weeks)
4. **Reduce mock over-reliance** (test_serverless_mode.py, test_memory_optimizer.py)
- Estimated time: 4 hours
- Impact: 2x confidence in integration behavior
1. **Add stateful testing** (1-2 state machines)
- Estimated time: 3 hours
- Impact: Find complex interaction bugs
1. **Create custom Hypothesis strategies**
- Estimated time: 2 hours
- Impact: Reusable test data generation
### Long-Term Excellence (1 Month)
7. **Comprehensive property test suite** (20+ properties)
1. **Complete fixture refactoring** (eliminate all duplicate setup)
1. **Benchmark tests for performance baselines**
1. **Integration tests with real backends** (reduce mocks to \<20%)
______________________________________________________________________
## Conclusion
The Week 5 testing implementation demonstrates **solid foundational practices** with excellent async/await handling and clear organization. However, significant opportunities exist to elevate test quality through:
1. **Parametrization** - Eliminate loops and duplication
1. **Hypothesis Integration** - Property-based and stateful testing
1. **Reduced Mocking** - More integration, less isolation
1. **Advanced Fixtures** - Factories and parametrized fixtures
**Recommended Priority:** Implement Priority 1A-C (parametrization, fixtures, reduce mocks) within 1 week for maximum impact with minimal time investment.
**Score Summary:**
- Pytest Patterns: **7/10** (Good foundations, missing advanced features)
- Test Quality: **6.5/10** (Adequate coverage, weak integration)
- Overall: **7.5/10** (Good with clear improvement path to 9/10)
The test suite is **production-ready** but would benefit significantly from the improvements outlined above to reach **industry-leading quality** (9/10+).