Skip to main content
Glama
tasks.md38.1 kB
# Implementation Tasks: MCP Server Context Window Protection **Feature Branch**: `005-project-brownfield-hardening` **Specification**: [spec.md](./spec.md) **Implementation Plan**: [plan.md](./plan.md) **Generated**: 2025-10-15 ## Overview This document breaks down the implementation into actionable tasks organized by user story. Each phase represents an independently testable increment delivering specific value. **Total Tasks**: 85 tasks **Estimated Duration**: 15-16 days (with parallelization) **MVP Scope**: User Story 1 (Paginated List Results) - 25 tasks --- ## Task Organization Strategy Tasks are organized to enable **incremental delivery per user story**: 1. **Phase 1: Setup** (Infrastructure shared across all stories) 2. **Phase 2: Foundational Prerequisites** (Must complete before any story) 3. **Phase 3: US1 - Paginated List Results (P1)** - MVP 4. **Phase 4: US2 - Automatic Response Summarization (P1)** 5. **Phase 5: US3 - Configurable Token Budgets (P2)** 6. **Phase 6: US4 - Response Chunking (P2)** 7. **Phase 7: US5 - Telemetry and Monitoring (P3)** 8. **Phase 8: Polish & Cross-Cutting Concerns** **Parallelization**: Tasks marked with `[P]` can run in parallel within their phase. --- ## Phase 1: Setup (Infrastructure) **Goal**: Initialize project structure and shared utilities needed by all user stories **Duration**: 1-2 days | **Parallelizable**: All tasks [P] ### T001: [Setup][P] Install watchdog dependency **File**: `pyproject.toml` **Description**: Add `watchdog>=3.0.0` to dependencies for file watcher (config hot-reload) **Acceptance**: Dependency added, `uv sync` or `pip install` succeeds **Story**: Shared Infrastructure ### T002: [Setup][P] Create cursor codec utility **File**: `src/utils/cursor_codec.py` **Description**: Implement Base64+HMAC-SHA256 cursor encoding/decoding per research.md R002 ```python def encode_cursor(payload: dict, secret: str) -> str def decode_cursor(cursor: str, secret: str) -> dict | raises ValueError ``` **Acceptance**: Cursors are opaque, signed, decode-able, tamper-proof **Story**: Shared Infrastructure ### T003: [Setup][P] Create field projector utility **File**: `src/utils/field_projector.py` **Description**: Implement JSON path field projection utility ```python def project_fields(obj: dict, field_paths: list[str]) -> dict ``` **Acceptance**: Can select nested fields like `"price.basePrice"`, returns partial object **Story**: Shared Infrastructure ### T004: [Setup][P] Create in-memory cursor storage **File**: `src/services/cursor_storage.py` **Description**: Implement thread-safe in-memory cursor storage with 10-minute TTL ```python class InMemoryCursorStorage: def store(cursor_id: str, payload: dict) -> None def retrieve(cursor_id: str) -> Optional[dict] def cleanup_expired() -> int ``` **Acceptance**: TTL enforced, expired cursors cleaned up, thread-safe **Story**: Shared Infrastructure --- ## Phase 2: Foundational Prerequisites **Goal**: Core infrastructure that blocks all user stories - must complete first **Duration**: 2-3 days | **Parallelizable**: T005-T008 [P], then T009-T010 sequentially ### T005: [Foundation][P] Write unit tests for cursor codec **File**: `tests/unit/test_cursor_codec.py` **Description**: Test encoding, decoding, signature validation, expiry, tampering **Acceptance**: 80%+ coverage, signature validation prevents tampering **Story**: Foundation - TDD ### T006: [Foundation][P] Write unit tests for field projector **File**: `tests/unit/test_field_projector.py` **Description**: Test field selection, nested paths, missing fields, edge cases **Acceptance**: 80%+ coverage, handles nested objects correctly **Story**: Foundation - TDD ### T007: [Foundation][P] Write unit tests for cursor storage **File**: `tests/unit/test_cursor_storage.py` **Description**: Test store, retrieve, TTL expiration, cleanup, thread safety **Acceptance**: 80%+ coverage, TTL works correctly **Story**: Foundation - TDD ### T008: [Foundation][P] Extend HostawayConfig base **File**: `src/mcp/config.py` **Description**: Add base configuration fields for context protection (without story-specific fields yet) ```python class ContextProtectionConfig(BaseSettings): enabled: bool = True # Story-specific fields added in their phases ``` **Acceptance**: Pydantic validation works, environment variable overrides work **Story**: Foundation ### T009: [Foundation] Update main.py to initialize cursor storage **File**: `src/api/main.py` **Description**: Initialize InMemoryCursorStorage in lifespan, add cleanup background task **Acceptance**: Storage available globally, cleanup runs every 60 seconds **Story**: Foundation **Depends On**: T004, T008 ### T010: [Foundation] Create projection maps skeleton **File**: `src/mcp/schemas/projection_maps.py` **Description**: Create empty projection map structure (populated per story) ```python PROJECTION_MAPS: dict[str, list[str]] = {} ``` **Acceptance**: Module exists, ready for story-specific maps **Story**: Foundation **Checkpoint**: ✅ Foundation complete. All user stories can now proceed in parallel. --- ## Phase 3: US1 - Paginated List Results (P1) **MVP** **User Story**: Claude requests lists and receives manageable pages with cursors **Goal**: Enable cursor-based pagination on high-volume list endpoints **Duration**: 3-4 days | **Independent Test**: Call list endpoint with 500 items, verify first 50 returned with cursor ### Models & Services (Parallelizable) ### T011: [US1][P] Write unit tests for pagination models **File**: `tests/unit/test_pagination_models.py` **Description**: Test PaginatedResponse, PageMetadata, validation rules **Acceptance**: 80%+ coverage, generic types work, validation catches errors **Story**: US1 - TDD ### T012: [US1][P] Create pagination models **File**: `src/models/pagination.py` **Description**: Define Pydantic models per data-model.md ```python class PageMetadata(BaseModel): ... class PaginatedResponse(BaseModel, Generic[T]): ... ``` **Acceptance**: Type-safe, generic, passes unit tests **Story**: US1 **Depends On**: T011 ### T013: [US1][P] Write unit tests for pagination service **File**: `tests/unit/test_pagination_service.py` **Description**: Test cursor generation, page extraction, ordering, final page detection **Acceptance**: 80%+ coverage, cursors are stable and consistent **Story**: US1 - TDD ### T014: [US1][P] Create pagination service **File**: `src/services/pagination_service.py` **Description**: Implement pagination logic using cursor codec and storage ```python class PaginationService: def paginate(items: list[T], limit: int, cursor: str | None) -> PaginatedResponse[T] def generate_cursor(offset: int, order_by: str) -> str ``` **Acceptance**: Generates signed cursors, handles final page, passes unit tests **Story**: US1 **Depends On**: T002, T004, T013 ### Middleware (Sequential - depends on models/services) ### T015: [US1] Write unit tests for pagination middleware **File**: `tests/unit/test_pagination_middleware.py` **Description**: Test cursor extraction, envelope injection, backwards compatibility **Acceptance**: 80%+ coverage, non-paginated clients get first page **Story**: US1 - TDD **Depends On**: T012, T014 ### T016: [US1] Create pagination middleware **File**: `src/api/middleware/pagination_middleware.py` **Description**: Middleware to extract cursor param, inject pagination envelope **Acceptance**: Intercepts list responses, adds nextCursor/meta, passes tests **Story**: US1 **Depends On**: T012, T014, T015 ### T017: [US1] Register pagination middleware in main.py **File**: `src/api/main.py` **Description**: Add PaginationMiddleware to middleware stack (after auth, before response) **Acceptance**: Middleware active, logs show pagination decisions **Story**: US1 **Depends On**: T016 ### Configuration ### T018: [US1] Add pagination config to HostawayConfig **File**: `src/mcp/config.py` **Description**: Add pagination-specific config fields ```python default_page_size: int = 50 max_page_size: int = 200 pagination_enabled: bool = True ``` **Acceptance**: Environment variables override defaults, validation enforces limits **Story**: US1 **Depends On**: T008 ### Endpoint Integration (Parallelizable) ### T019: [US1][P] Write integration tests for listings pagination **File**: `tests/integration/test_pagination_endpoints.py` **Description**: Test GET /api/v1/listings with cursor navigation, final page **Acceptance**: Multi-page retrieval works, cursor expiry handled, no cursor = first page **Story**: US1 - TDD ### T020: [US1][P] Add pagination to listings endpoint **File**: `src/api/routes/listings.py` **Description**: Modify `get_listings` to use PaginationService, return PaginatedResponse **Acceptance**: Returns paginated results, passes integration tests **Story**: US1 **Depends On**: T012, T014, T017, T019 ### T021: [US1][P] Add pagination to bookings endpoint **File**: `src/api/routes/bookings.py` **Description**: Modify `get_bookings` to use PaginationService **Acceptance**: Returns paginated results, consistent with listings **Story**: US1 **Depends On**: T012, T014, T017 ### T022: [US1][P] Add pagination to financial endpoint **File**: `src/api/routes/financial.py` **Description**: Modify `get_transactions` to use PaginationService **Acceptance**: Returns paginated results, consistent with listings **Story**: US1 **Depends On**: T012, T014, T017 ### Contract & E2E Tests ### T023: [US1] Write contract tests for backwards compatibility **File**: `tests/contract/test_backwards_compatibility.py` **Description**: Validate non-paginated clients still work, golden response comparison **Acceptance**: Old clients unaffected, response schemas unchanged (additive only) **Story**: US1 - TDD **Depends On**: T020, T021, T022 ### T024: [US1] Write E2E test for multi-turn pagination **File**: `tests/e2e/test_multi_turn_pagination.py` **Description**: Simulate Claude Desktop fetching multiple pages sequentially **Acceptance**: Can navigate all pages, final page detected, cursor expiry handled **Story**: US1 - TDD **Depends On**: T020 ### T025: [US1] US1 acceptance validation **Description**: Run all US1 tests, verify acceptance scenarios from spec **Acceptance**: - ✅ 500 bookings → first 50 + cursor + totalCount - ✅ Cursor navigation returns next batch with updated cursor - ✅ 10 items → all returned without pagination overhead - ✅ Final page → no nextCursor provided **Story**: US1 **Depends On**: T011-T024 **Checkpoint**: ✅ US1 complete. MVP ready for deployment. Paginated lists prevent context overflow. --- ## Phase 4: US2 - Automatic Response Summarization (P1) **User Story**: Verbose responses are automatically summarized with drill-down instructions **Goal**: Prevent single-item context bloat through field projection and summarization **Duration**: 3-4 days | **Independent Test**: Request verbose booking, verify summary with drill-down instructions ### Models & Services (Parallelizable) ### T026: [US2][P] Write unit tests for token estimator **File**: `tests/unit/test_token_estimator.py` **Description**: Test character-based estimation, accuracy tracking, safety margin **Acceptance**: 80%+ coverage, estimates within 20% for test cases **Story**: US2 - TDD ### T027: [US2][P] Create token estimator service **File**: `src/services/token_estimator.py` **Description**: Implement character-based estimation per research.md R001 ```python def estimate_tokens(text: str) -> int # 4 chars/token + 20% margin def track_accuracy(estimated: int, actual: int) -> None ``` **Acceptance**: <20ms for 100KB, accuracy tracked, passes tests **Story**: US2 **Depends On**: T026 ### T028: [US2][P] Write unit tests for token budget models **File**: `tests/unit/test_token_budget_models.py` **Description**: Test TokenBudget, BudgetMetadata, threshold comparison **Acceptance**: 80%+ coverage, budget_used calculation correct **Story**: US2 - TDD ### T029: [US2][P] Create token budget models **File**: `src/models/token_budget.py` **Description**: Define Pydantic models per data-model.md ```python class TokenBudget(BaseModel): threshold: int = 4000 estimated_tokens: int @property budget_used() -> float @property summary_mode() -> bool ``` **Acceptance**: Type-safe, passes unit tests **Story**: US2 **Depends On**: T028 ### T030: [US2][P] Write unit tests for summarization models **File**: `tests/unit/test_summarization_models.py` **Description**: Test SummaryResponse, SummaryMetadata, DetailsFetchInfo **Acceptance**: 80%+ coverage, generic types work **Story**: US2 - TDD ### T031: [US2][P] Create summarization models **File**: `src/models/summarization.py` **Description**: Define Pydantic models per data-model.md ```python class SummaryResponse(BaseModel, Generic[T]): ... class SummaryMetadata(BaseModel): ... ``` **Acceptance**: Type-safe, passes unit tests **Story**: US2 **Depends On**: T030 ### T032: [US2][P] Write unit tests for summarization service **File**: `tests/unit/test_summarization_service.py` **Description**: Test field projection, text truncation, summary generation **Acceptance**: 80%+ coverage, retains identifying fields, provides drill-down instructions **Story**: US2 - TDD ### T033: [US2][P] Create summarization service **File**: `src/services/summarization_service.py` **Description**: Implement field projection + extractive summarization per research.md R003 ```python def summarize(obj: dict, projection_map: list[str]) -> SummaryResponse def truncate_text(text: str, max_length: int) -> str ``` **Acceptance**: Uses field projector, semantic truncation, passes tests **Story**: US2 **Depends On**: T003, T032 ### Middleware (Sequential) ### T034: [US2] Write unit tests for token-aware middleware **File**: `tests/unit/test_token_aware_middleware.py` **Description**: Test token estimation, threshold comparison, preview mode activation **Acceptance**: 80%+ coverage, oversized responses trigger summarization **Story**: US2 - TDD **Depends On**: T027, T029, T033 ### T035: [US2] Create token-aware middleware **File**: `src/api/middleware/token_aware_middleware.py` **Description**: Middleware to estimate tokens, switch to preview mode if over threshold **Acceptance**: Intercepts responses, estimates tokens, applies summarization, passes tests **Story**: US2 **Depends On**: T027, T029, T033, T034 ### T036: [US2] Register token-aware middleware in main.py **File**: `src/api/main.py` **Description**: Add TokenAwareMiddleware before pagination (runs after, sees full response) **Acceptance**: Middleware active, logs show summarization decisions **Story**: US2 **Depends On**: T035 ### Configuration & Projection Maps ### T037: [US2] Add token budget config to HostawayConfig **File**: `src/mcp/config.py` **Description**: Add token budget config fields ```python output_token_threshold: int = 4000 hard_output_token_cap: int = 12000 summarization_enabled: bool = True ``` **Acceptance**: Environment overrides work, validation enforces limits **Story**: US2 **Depends On**: T018 ### T038: [US2][P] Define projection map for bookings **File**: `src/mcp/schemas/projection_maps.py` **Description**: Add booking summary field list (id, status, guest, dates, price, property) **Acceptance**: 7 essential fields, ~70% size reduction **Story**: US2 **Depends On**: T010 ### T039: [US2][P] Define projection map for listings **File**: `src/mcp/schemas/projection_maps.py` **Description**: Add listing summary field list (id, name, status, bedrooms, price, location) **Acceptance**: Essential fields, ~70% size reduction **Story**: US2 **Depends On**: T010 ### T040: [US2][P] Define projection map for financial records **File**: `src/mcp/schemas/projection_maps.py` **Description**: Add financial summary field list (id, type, amount, date, status) **Acceptance**: Essential fields, ~70% size reduction **Story**: US2 **Depends On**: T010 ### Endpoint Integration (Parallelizable) ### T041: [US2][P] Write integration tests for summarization **File**: `tests/integration/test_summarization_endpoints.py` **Description**: Test oversized booking → summary, small booking → full, drill-down **Acceptance**: Threshold triggers summarization, metadata includes drill-down instructions **Story**: US2 - TDD ### T042: [US2][P] Apply summarization to booking details endpoint **File**: `src/api/routes/bookings.py` **Description**: Modify `get_booking_by_id` to support field projection via `?fields=` param **Acceptance**: Returns summary if oversized, accepts fields param for drill-down **Story**: US2 **Depends On**: T033, T036, T038, T041 ### T043: [US2][P] Apply summarization to listing details endpoint **File**: `src/api/routes/listings.py` **Description**: Modify `get_listing_by_id` to support field projection **Acceptance**: Returns summary if oversized, accepts fields param **Story**: US2 **Depends On**: T033, T036, T039 ### T044: [US2] US2 acceptance validation **Description**: Run all US2 tests, verify acceptance scenarios from spec **Acceptance**: - ✅ 2000+ token booking → summary with essential fields + drill-down instructions - ✅ Request full financial section → only that section returned - ✅ <500 token resource → full details without summarization - ✅ Summary metadata shows estimated tokens, budget used, budget remaining **Story**: US2 **Depends On**: T026-T043 **Checkpoint**: ✅ US2 complete. Verbose responses no longer cause context overflow. US1+US2 deployed together. --- ## Phase 5: US3 - Configurable Token Budgets (P2) **User Story**: Operations teams can configure limits via environment/files without code changes **Goal**: Runtime configuration with hot-reload and per-endpoint overrides **Duration**: 2-3 days | **Independent Test**: Update config, verify reload without restart ### Configuration System (Parallelizable) ### T045: [US3][P] Write unit tests for extended config **File**: `tests/unit/test_context_protection_config.py` **Description**: Test all config fields, validation, defaults, environment overrides **Acceptance**: 80%+ coverage, invalid values rejected **Story**: US3 - TDD ### T046: [US3][P] Extend HostawayConfig with all context protection fields **File**: `src/mcp/config.py` **Description**: Consolidate all config from US1-US4 into complete ContextProtectionConfig ```python class ContextProtectionConfig(BaseSettings): # Pagination (from US1) default_page_size: int = 50 max_page_size: int = 200 # Token Budget (from US2) output_token_threshold: int = 4000 hard_output_token_cap: int = 12000 # Chunking (for US4) chunk_size_bytes: int = 8192 # Feature Flags endpoint_overrides: dict[str, dict] = {} ``` **Acceptance**: All fields present, Pydantic validation works, passes tests **Story**: US3 **Depends On**: T045 ### T047: [US3][P] Write unit tests for config reloader **File**: `tests/unit/test_config_reloader.py` **Description**: Test file watching, hot-reload, validation before swap, fail-safe **Acceptance**: 80%+ coverage, invalid config keeps previous, reload <100ms **Story**: US3 - TDD ### T048: [US3][P] Create config reloader service **File**: `src/services/config_reloader.py` **Description**: Implement watchdog-based config reloader per research.md R004 ```python class ConfigReloader: def reload() -> None # Atomic read-validate-swap def get_config() -> ContextProtectionConfig ``` **Acceptance**: Thread-safe, validates before applying, passes tests **Story**: US3 **Depends On**: T001, T046, T047 ### T049: [US3] Wire config reloader into main.py lifespan **File**: `src/api/main.py` **Description**: Start ConfigReloader observer in lifespan startup, stop in shutdown **Acceptance**: Config changes detected within 100ms, no request drops **Story**: US3 **Depends On**: T048 ### T050: [US3] Write integration test for config hot-reload **File**: `tests/integration/test_config_reload.py` **Description**: Modify config file, verify middleware respects new limits without restart **Acceptance**: Threshold change reflected in behavior, in-flight requests unaffected **Story**: US3 **Depends On**: T049 ### Feature Flags (Parallelizable) ### T051: [US3][P] Implement per-endpoint feature flag logic **File**: `src/api/middleware/pagination_middleware.py` and `token_aware_middleware.py` **Description**: Check `endpoint_overrides` config before applying pagination/summarization **Acceptance**: Can disable pagination for specific endpoint, others unaffected **Story**: US3 **Depends On**: T046 ### T052: [US3] Write integration test for feature flags **File**: `tests/integration/test_feature_flags.py` **Description**: Disable pagination for one endpoint, verify others still paginated **Acceptance**: Granular control per endpoint works **Story**: US3 **Depends On**: T051 ### T053: [US3] US3 acceptance validation **Description**: Run all US3 tests, verify acceptance scenarios from spec **Acceptance**: - ✅ Update threshold 4000→8000, subsequent responses respect 8000 - ✅ Feature flag enables pagination for specific endpoint only - ✅ New deployment uses documented safe defaults (4000 tokens, 50 items/page) - ✅ Invalid config → error logged, current config remains active **Story**: US3 **Depends On**: T045-T052 **Checkpoint**: ✅ US3 complete. Operations teams can tune limits without code deployments. --- ## Phase 6: US4 - Response Chunking (P2) **User Story**: Large text content chunked with semantic boundaries and continuation cursors **Goal**: Progressive access to logs, descriptions, documents **Duration**: 2-3 days | **Independent Test**: Request 50KB log file, verify chunked with semantic boundaries ### Models & Services (Parallelizable) ### T054: [US4][P] Write unit tests for content chunk models **File**: `tests/unit/test_content_chunk_models.py` **Description**: Test ContentChunk, ChunkMetadata, validation rules **Acceptance**: 80%+ coverage, chunk index validation works **Story**: US4 - TDD ### T055: [US4][P] Create content chunk models **File**: `src/models/content_chunk.py` **Description**: Define Pydantic models per data-model.md ```python class ChunkMetadata(BaseModel): ... class ContentChunk(BaseModel): ... ``` **Acceptance**: Type-safe, passes unit tests **Story**: US4 **Depends On**: T054 ### T056: [US4][P] Write unit tests for chunking service **File**: `tests/unit/test_chunking_service.py` **Description**: Test semantic boundary detection, range requests, chunk cursor generation **Acceptance**: 80%+ coverage, no mid-sentence splits, respects line/paragraph boundaries **Story**: US4 - TDD ### T057: [US4][P] Create chunking service **File**: `src/services/chunking_service.py` **Description**: Implement chunking with semantic boundaries per research.md ```python def chunk_content(content: str, chunk_size: int) -> ContentChunk def chunk_by_range(content: str, start_line: int, end_line: int) -> ContentChunk ``` **Acceptance**: Semantic boundaries respected, passes tests **Story**: US4 **Depends On**: T002, T056 ### Endpoint Integration ### T058: [US4] Add chunking config to HostawayConfig **File**: `src/mcp/config.py` **Description**: Add `chunk_size_bytes: int = 8192` to config **Acceptance**: Environment override works **Story**: US4 **Depends On**: T046 ### T059: [US4] Write integration test for chunking **File**: `tests/integration/test_content_chunking.py` **Description**: Test large log file → chunks, range request, multi-chunk retrieval **Acceptance**: Chunks respect semantic boundaries, continuation cursors work **Story**: US4 - TDD ### T060: [US4] Add chunking to hypothetical logs endpoint (if exists) **File**: `src/api/routes/analytics.py` (or create new logs route) **Description**: Apply chunking service to large text responses (logs, descriptions) **Acceptance**: >2000 token content chunked, <1000 token content returned whole **Story**: US4 **Depends On**: T057, T058, T059 ### T061: [US4] US4 acceptance validation **Description**: Run all US4 tests, verify acceptance scenarios from spec **Acceptance**: - ✅ 10,000 line log → first 200 lines + cursor + total count - ✅ Chunks end at log entry/paragraph/sentence completion (no mid-splits) - ✅ Range request (startLine=100, endLine=200) → only that range - ✅ <1000 token content → full content without chunking **Story**: US4 **Depends On**: T054-T060 **Checkpoint**: ✅ US4 complete. Large content accessible progressively without context overflow. --- ## Phase 7: US5 - Telemetry and Monitoring (P3) **User Story**: Engineers observe token usage, pagination adoption, overflow attempts via dashboards **Goal**: Operational visibility for optimization and debugging **Duration**: 2-3 days | **Independent Test**: Make 100 requests, verify metrics captured accurately ### Models & Services (Parallelizable) ### T062: [US5][P] Write unit tests for telemetry models **File**: `tests/unit/test_telemetry_models.py` **Description**: Test TelemetryRecord, computed properties, validation **Acceptance**: 80%+ coverage, tokens_per_item calculation correct **Story**: US5 - TDD ### T063: [US5][P] Create telemetry models **File**: `src/models/telemetry.py` **Description**: Define Pydantic models per data-model.md ```python class TelemetryRecord(BaseModel): request_id: str endpoint: str estimated_tokens: int response_bytes: int pagination_used: bool summarization_used: bool # ... etc ``` **Acceptance**: Type-safe, passes unit tests **Story**: US5 **Depends On**: T062 ### Middleware ### T064: [US5] Write unit tests for telemetry middleware **File**: `tests/unit/test_telemetry_middleware.py` **Description**: Test metrics recording, overhead <10ms, fire-and-forget (no failures) **Acceptance**: 80%+ coverage, telemetry failures don't affect requests **Story**: US5 - TDD ### T065: [US5] Create telemetry middleware **File**: `src/api/middleware/telemetry_middleware.py` **Description**: Middleware to record per-request metrics **Acceptance**: Captures all required fields, <10ms overhead, passes tests **Story**: US5 **Depends On**: T063, T064 ### T066: [US5] Register telemetry middleware in main.py **File**: `src/api/main.py` **Description**: Add TelemetryMiddleware (runs first, sees full request/response cycle) **Acceptance**: Middleware active, metrics logged at INFO level **Story**: US5 **Depends On**: T065 ### Metrics & Logging ### T067: [US5][P] Implement metrics emission (Prometheus format) **File**: `src/services/metrics_emitter.py` **Description**: Emit metrics in Prometheus/StatsD format for external collection ```python def emit_metric(record: TelemetryRecord) -> None ``` **Acceptance**: Metrics queryable by endpoint, time range, outcome **Story**: US5 ### T068: [US5][P] Add oversized event logging **File**: `src/api/middleware/token_aware_middleware.py` **Description**: Log oversized response attempts (original size, final size, reduction %) **Acceptance**: INFO logs include context for debugging **Story**: US5 **Depends On**: T035 ### T069: [US5][P] Add accuracy sampling logic **File**: `src/services/token_estimator.py` **Description**: Sample 1% of responses, log estimated vs actual (if actual available) **Acceptance**: Sampling rate configurable, logs enable accuracy tuning **Story**: US5 **Depends On**: T027 ### Health Endpoint Extensions ### T070: [US5] Extend /health endpoint with pagination metrics **File**: `src/api/main.py` **Description**: Add pagination adoption rate, avg response size to health response ```python { "status": "healthy", "metrics": { "pagination_adoption": 0.95, "avg_response_size": 2400, "oversized_events": 12 } } ``` **Acceptance**: Metrics updated in real-time from TelemetryRecords **Story**: US5 **Depends On**: T065 ### Integration Tests ### T071: [US5] Write integration test for telemetry **File**: `tests/integration/test_telemetry_recording.py` **Description**: Make 100 requests, verify all metrics captured accurately **Acceptance**: Metrics match actual request characteristics **Story**: US5 - TDD **Depends On**: T066 ### T072: [US5] Write load test for telemetry overhead **File**: `tests/integration/test_telemetry_performance.py` **Description**: Verify telemetry adds ≤10ms latency under 100 concurrent requests **Acceptance**: p95 latency increase ≤10ms **Story**: US5 - TDD **Depends On**: T066 ### T073: [US5] US5 acceptance validation **Description**: Run all US5 tests, verify acceptance scenarios from spec **Acceptance**: - ✅ 100 requests → all record tokens, bytes, items, latency - ✅ Oversized event → log includes original size, final size, reduction % - ✅ Pagination usage tracked separately from non-paginated requests - ✅ Alerts triggered when oversized attempts >5% of traffic **Story**: US5 **Depends On**: T062-T072 **Checkpoint**: ✅ US5 complete. Full observability enables data-driven optimization. --- ## Phase 8: Polish & Cross-Cutting Concerns **Goal**: Final integration, documentation, and deployment readiness **Duration**: 2-3 days ### Documentation ### T074: [Polish] Update API documentation (OpenAPI) **File**: FastAPI auto-generated docs **Description**: Ensure pagination params, token budget metadata documented in /docs **Acceptance**: OpenAPI spec includes all new fields, backwards compatibility noted **Story**: Polish ### T075: [Polish] Create operator runbook **File**: `docs/CONTEXT_PROTECTION_RUNBOOK.md` **Description**: Document tuning thresholds, reading dashboards, common errors, troubleshooting **Acceptance**: Runbook covers all operational scenarios from quickstart.md **Story**: Polish ### Contract & E2E Validation ### T076: [Polish] Write contract test for response schema additive-only **File**: `tests/contract/test_response_schemas.py` **Description**: Validate all enhanced responses are additive (no removed fields) **Acceptance**: Golden response comparison passes, breaking changes detected **Story**: Polish ### T077: [Polish] Write E2E test for summary → detail drill-down **File**: `tests/e2e/test_summary_drilldown.py` **Description**: Request summary, follow detailsAvailable instructions, verify full object **Acceptance**: Multi-turn workflow works end-to-end **Story**: Polish **Depends On**: US1+US2 complete ### T078: [Polish] Write E2E test for token budget simulation **File**: `tests/e2e/test_token_budget_simulation.py` **Description**: Simulate context overflow scenarios, verify prevention **Acceptance**: No requests exceed hard cap, summarization activates correctly **Story**: Polish **Depends On**: US2 complete ### Load & Performance Testing ### T079: [Polish] Write load test for pagination overhead **File**: `tests/integration/test_pagination_performance.py` **Description**: Verify pagination adds <50ms latency under 100 concurrent requests **Acceptance**: p95 pagination overhead <50ms **Story**: Polish **Depends On**: US1 complete ### T080: [Polish] Write load test for token estimation performance **File**: `tests/integration/test_token_estimation_performance.py` **Description**: Verify estimation completes <20ms for 100KB responses **Acceptance**: p95 estimation time <20ms **Story**: Polish **Depends On**: US2 complete ### T081: [Polish] Write load test for overall system performance **File**: `tests/integration/test_system_performance.py` **Description**: Verify overall p95 latency increase ≤10% with all features enabled **Acceptance**: Baseline vs hardened comparison, <10% increase **Story**: Polish **Depends On**: All US complete ### Deployment Preparation ### T082: [Polish] Create feature flag configuration template **File**: `config/context_protection.yaml.example` **Description**: Document all config options with safe defaults for production **Acceptance**: Example config covers all scenarios from quickstart.md **Story**: Polish ### T083: [Polish] Create canary deployment checklist **File**: `docs/CANARY_DEPLOYMENT.md` **Description**: Document 5% → 25% → 50% → 100% rollout process, metrics to monitor, rollback procedure **Acceptance**: Checklist actionable for operations team **Story**: Polish ### T084: [Polish] Write A/B testing mode toggle **File**: `src/mcp/config.py` **Description**: Add `ab_test_mode: bool` to compare responses with/without hardening **Acceptance**: Can run dual mode for validation **Story**: Polish **Depends On**: T046 ### Final Validation ### T085: [Polish] Full acceptance criteria validation **Description**: Run all tests, verify all success criteria from spec.md **Acceptance**: - ✅ All 5 user story acceptance scenarios pass - ✅ All 36 functional requirements (FR-001 to FR-036) satisfied - ✅ All 8 success criteria (SC-001 to SC-008) achievable - ✅ 80%+ code coverage across all modules - ✅ No regressions in existing functionality **Story**: Polish **Depends On**: T001-T084 **Checkpoint**: ✅ All phases complete. Feature ready for canary deployment. --- ## Dependency Graph ``` Setup (T001-T004) [ALL PARALLEL] ↓ Foundation (T005-T010) [T005-T008 PARALLEL, then T009-T010] ↓ ┌──────────────┬──────────────┬──────────────┬──────────────┬──────────────┐ │ │ │ │ │ │ US1 (T011-T025)│ US2 (T026-T044)│ US3 (T045-T053)│ US4 (T054-T061)│ US5 (T062-T073)│ P1 - MVP │ P1 │ P2 │ P2 │ P3 │ │ │ │ │ │ │ │ (US1 must │ (Depends │ (Depends │ (Can run │ (Can run │ │ complete │ on US1) │ on US1+US2) │ parallel │ parallel │ │ first) │ │ │ with US3) │ with US3+US4)│ └──────┬───────┴──────┬───────┴──────┬───────┴──────┬───────┴──────┬───────┘ │ │ │ │ │ └──────────────┴──────────────┴──────────────┴──────────────┘ ↓ Polish (T074-T085) (Final integration) ``` **Critical Path**: 1. Setup → Foundation → US1 (MVP) 2. US1 → US2 (builds on pagination) 3. US2 → US3 (needs token budget to configure) 4. US3 → US4, US5 can run in parallel 5. All → Polish & Final Validation --- ## Parallel Execution Examples ### Example 1: Phase 1 (Setup) - 4 parallel tasks ```bash # All 4 tasks can run simultaneously (different files, no dependencies) Developer 1: T001 (pyproject.toml) Developer 2: T002 (cursor_codec.py) Developer 3: T003 (field_projector.py) Developer 4: T004 (cursor_storage.py) Timeline: 1 day (vs 4 days sequential) ``` ### Example 2: US1 Models & Services - 4 parallel tasks ```bash # After T011-T013 tests written, implementation can parallelize: Developer 1: T012 (pagination models) Developer 2: T014 (pagination service) Developer 3: T019 (integration tests) Developer 4: T020 (listings endpoint) Timeline: 1 day (vs 4 days sequential) ``` ### Example 3: US2 Projection Maps - 3 parallel tasks ```bash # Different files, independent definitions: Developer 1: T038 (bookings projection map) Developer 2: T039 (listings projection map) Developer 3: T040 (financial projection map) Timeline: 0.5 days (vs 1.5 days sequential) ``` --- ## Implementation Strategy ### MVP First (US1 Only) **Recommended Approach**: Deploy US1 alone to production first **Rationale**: - Addresses 80% of context overflow issues (unbounded lists) - Independently testable and valuable - Lower risk for first deployment - Validates infrastructure before adding complexity **MVP Tasks**: T001-T025 (25 tasks, ~5 days with 3 developers) **Validation**: After US1 deployed, monitor for 1 week: - Pagination adoption rate - Context overflow reduction - Performance impact - Client compatibility ### Incremental Rollout 1. **Week 1**: US1 (MVP) - Paginated lists 2. **Week 2**: US2 - Summarization (builds on US1) 3. **Week 3**: US3 + US4 in parallel - Configuration + Chunking 4. **Week 4**: US5 + Polish - Telemetry + Final validation 5. **Week 5**: Canary deployment and GA --- ## Task Count Summary | Phase | Tasks | Parallelizable | Estimated Days | |-------|-------|----------------|----------------| | Setup | 4 | 4 (100%) | 1-2 | | Foundation | 6 | 4 (67%) | 2-3 | | US1 (MVP) | 15 | 9 (60%) | 3-4 | | US2 | 19 | 11 (58%) | 3-4 | | US3 | 9 | 6 (67%) | 2-3 | | US4 | 8 | 5 (63%) | 2-3 | | US5 | 12 | 6 (50%) | 2-3 | | Polish | 12 | 6 (50%) | 2-3 | | **TOTAL** | **85** | **51 (60%)** | **15-16** | --- ## Success Metrics Validation After T085 completion, verify these metrics are achievable: - **SC-001**: 99.9% sessions without context overflow → Run E2E tests with 1000 multi-step workflows - **SC-002**: 60% token reduction → Compare baseline vs hardened avg response size - **SC-003**: 95% pagination adoption → Check health endpoint metrics after 2 weeks - **SC-004**: 80% truncation ticket reduction → Monitor support tickets for 1 month - **SC-005**: ≤10% latency increase → Load test results (T081) - **SC-006**: 5-min diagnostics → Time troubleshooting scenarios with telemetry - **SC-007**: 95% successful pagination → Contract tests + E2E tests pass rate - **SC-008**: 90% estimation accuracy → Sampling logs from T069 --- **Status**: Tasks generated. Ready for implementation. Start with T001-T004 (Setup phase). **Next Steps**: 1. Review task breakdown with team 2. Assign T001-T004 to developers (all parallelizable) 3. Begin TDD implementation 4. Track progress using task completion checkboxes 5. Deploy US1 (MVP) after T025 validation

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/darrentmorgan/hostaway-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server