Skip to main content
Glama
plan.md62 kB
# Implementation Plan: Automated Overflow-Safety Test Suite **Feature ID**: 006-overflow-safety-test-suite **Status**: Ready for Implementation **Target**: TypeScript/Vitest test harness with stdio MCP client **Created**: 2025-10-16 **Last Updated**: 2025-10-16 --- ## 1. Executive Summary ### Objective Build a **TypeScript/Vitest test harness** that: - Launches the MCP server over stdio - Mocks Hostaway HTTP responses with nock - Asserts output-size contracts (pagination, preview, chunking, error compactness) - Validates behavior under configurable token caps ### Success Criteria - ✅ Test suite runs in <5 minutes in CI - ✅ 100% coverage of high-volume endpoints - ✅ 0 violations of hard token cap - ✅ ≥95% of endpoints return paginated/summarized results under forced-small caps - ✅ All error responses <2KB - ✅ Flake rate <1% ### Architecture Overview ``` ┌─────────────────────────────────────────────────────────┐ │ Vitest Test Suite (TypeScript) │ ├─────────────────────────────────────────────────────────┤ │ │ │ ┌────────────────┐ ┌──────────────────┐ │ │ │ MCP Client │ │ Token Estimator │ │ │ │ (stdio) │ │ (bytes/4 * 1.2) │ │ │ └────────┬───────┘ └────────┬─────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌─────────────────────────────────────┐ │ │ │ Test Assertions │ │ │ │ - Pagination contracts │ │ │ │ - Preview mode activation │ │ │ │ - Token cap enforcement │ │ │ │ - Error hygiene │ │ │ └──────────────┬──────────────────────┘ │ └─────────────────┼────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ MCP Server (Python, stdio) │ ├─────────────────────────────────────────────────────────┤ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Pagination │ │ Preview │ │ Token │ │ │ │ Service │ │ Service │ │ Estimator │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────┐ │ │ │ Endpoint Handlers │ │ │ └───────────────────┬─────────────────────────┘ │ └────────────────────────┼───────────────────────────────┘ │ ▼ (mocked by nock) ┌──────────────────────┐ │ Hostaway API │ │ (HTTP mocked) │ └──────────────────────┘ ``` --- ## 2. Implementation Phases ### Phase 1: Foundation (Week 1) - 5 days **Goal**: Set up TypeScript/Vitest infrastructure with MCP stdio client and nock mocking #### Day 1-2: Project Setup & Dependencies **Tasks**: - [ ] T001: Initialize TypeScript test project - Create `test/` directory structure - Configure `tsconfig.json` for test environment - Set up Vitest configuration - Install dependencies (vitest, @modelcontextprotocol/sdk, nock, etc.) **Deliverables**: ```typescript // test/vitest.config.ts import { defineConfig } from 'vitest/config' export default defineConfig({ test: { globals: true, environment: 'node', setupFiles: ['./setup.ts'], timeout: 30000, // 30s per test hookTimeout: 10000, }, }) ``` **Dependencies** (package.json): ```json { "devDependencies": { "@modelcontextprotocol/sdk": "^1.0.0", "@types/node": "^20.0.0", "nock": "^14.0.0", "tsx": "^4.0.0", "typescript": "^5.0.0", "vitest": "^1.0.0" } } ``` **Acceptance Criteria**: - ✅ `npm test` runs Vitest successfully - ✅ TypeScript compiles without errors - ✅ Basic test file executes --- #### Day 2-3: MCP Client Wrapper **Tasks**: - [ ] T002: Implement MCP stdio client wrapper - Spawn MCP server process - Handle stdio communication - Tool invocation helpers - Graceful shutdown **Deliverables**: ```typescript // test/utils/mcpClient.ts import { spawn, ChildProcess } from 'child_process' import { Client } from '@modelcontextprotocol/sdk/client/index.js' import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js' export class MCPTestClient { private process: ChildProcess private client: Client private transport: StdioClientTransport constructor(private config: { serverCommand: string serverArgs?: string[] env?: Record<string, string> }) {} async start(): Promise<void> { // Spawn MCP server this.process = spawn(this.config.serverCommand, this.config.serverArgs || [], { env: { ...process.env, ...this.config.env }, stdio: ['pipe', 'pipe', 'inherit'], }) // Create stdio transport this.transport = new StdioClientTransport({ reader: this.process.stdout, writer: this.process.stdin, }) // Initialize client this.client = new Client( { name: 'overflow-safety-tests', version: '1.0.0', }, { capabilities: {}, } ) await this.client.connect(this.transport) } async callTool<T = unknown>( toolName: string, args: Record<string, unknown> ): Promise<{ content: T estimatedTokens: number isError: boolean }> { const result = await this.client.callTool({ name: toolName, arguments: args, }) // Serialize result and estimate tokens const content = JSON.stringify(result.content) const estimatedTokens = estimateTokens(content) return { content: result.content as T, estimatedTokens, isError: result.isError || false, } } async listTools() { return await this.client.listTools() } async listResources() { return await this.client.listResources() } async readResource(uri: string) { return await this.client.readResource({ uri }) } async stop(): Promise<void> { await this.client.close() this.process.kill() // Wait for process to exit return new Promise((resolve) => { this.process.on('exit', () => resolve()) }) } } // Helper function function estimateTokens(text: string): number { // bytes/4 heuristic with 20% safety margin const bytes = Buffer.byteLength(text, 'utf-8') const baseEstimate = bytes / 4 return Math.ceil(baseEstimate * 1.2) } ``` **Acceptance Criteria**: - ✅ Client can spawn MCP server - ✅ Client can call tools and get responses - ✅ Client handles graceful shutdown - ✅ Token estimation matches Python implementation (±5%) --- #### Day 3-4: Token Estimator & Assertions **Tasks**: - [ ] T003: Implement token estimation utilities - Port Python token estimator to TypeScript - Create assertion helpers for token caps - Budget checking utilities **Deliverables**: ```typescript // test/utils/tokenEstimator.ts export function estimateTokens(text: string): number { const charCount = text.length const baseEstimate = charCount / 4 const safetyMargin = baseEstimate * 0.20 return Math.ceil(baseEstimate + safetyMargin) } export function estimateTokensFromObject(obj: unknown): number { const json = JSON.stringify(obj) return estimateTokens(json) } export function checkTokenBudget( text: string, threshold: number ): { estimated: number exceeds: boolean ratio: number } { const estimated = estimateTokens(text) const exceeds = estimated > threshold const ratio = threshold > 0 ? estimated / threshold : 0 return { estimated, exceeds, ratio } } // Vitest assertion helpers export function assertWithinTokenBudget( content: unknown, threshold: number, message?: string ) { const tokens = estimateTokensFromObject(content) if (tokens > threshold) { throw new Error( message || `Token budget exceeded: ${tokens} tokens > ${threshold} threshold` ) } } export function assertBelowHardCap(content: unknown, hardCap: number) { const tokens = estimateTokensFromObject(content) if (tokens > hardCap) { throw new Error( `HARD CAP VIOLATION: ${tokens} tokens > ${hardCap} hard cap` ) } } ``` **Acceptance Criteria**: - ✅ Token estimation accurate to ±10% - ✅ Assertion helpers integrate with Vitest - ✅ Budget checking supports soft/hard thresholds --- #### Day 4-5: Nock HTTP Mocking Infrastructure **Tasks**: - [ ] T004: Set up nock mocking for Hostaway API - Create fixture generators - Mock authentication endpoints - Mock pagination scenarios - Mock error responses **Deliverables**: ```typescript // test/fixtures/hostaway/setup.ts import nock from 'nock' const HOSTAWAY_BASE_URL = 'https://api.hostaway.com' export function setupHostawayMocks() { // Clear any existing mocks nock.cleanAll() // Mock authentication nock(HOSTAWAY_BASE_URL) .post('/v1/accessTokens') .reply(200, { access_token: 'test_token_abc123', token_type: 'Bearer', expires_in: 3600, }) .persist() // Mock listings endpoint with pagination nock(HOSTAWAY_BASE_URL) .get('/v1/listings') .query(true) // Accept any query params .reply((uri) => { const url = new URL(uri, HOSTAWAY_BASE_URL) const limit = parseInt(url.searchParams.get('limit') || '50') const offset = parseInt(url.searchParams.get('offset') || '0') return [ 200, { status: 'success', result: generateProperties(limit), count: 500, // Total available limit, offset, }, ] }) .persist() } export function setupLargeFinancialReportMock() { // Mock that returns oversized financial data nock(HOSTAWAY_BASE_URL) .get('/v1/financials/reports') .query(true) .reply(200, { status: 'success', result: generateLargeFinancialReport(), }) } export function setupErrorMocks() { // Mock 500 error with large HTML body nock(HOSTAWAY_BASE_URL) .get('/v1/error/500') .reply(500, generateLargeHtmlError()) // Mock 429 rate limit with headers nock(HOSTAWAY_BASE_URL) .get('/v1/error/429') .reply(429, { error: 'rate_limit_exceeded', message: 'Too many requests', }, { 'X-RateLimit-Remaining': '0', 'X-RateLimit-Reset': String(Date.now() + 60000), 'Retry-After': '60', }) } ``` ```typescript // test/fixtures/generators/properties.ts export function generateProperties(count: number) { return Array.from({ length: count }, (_, i) => ({ id: i + 1, name: `Property ${i + 1}`, address: `${i + 1} Main Street`, city: ['San Francisco', 'New York', 'Los Angeles'][i % 3], state: ['CA', 'NY', 'CA'][i % 3], country: 'USA', capacity: Math.floor(Math.random() * 10) + 2, bedrooms: Math.floor(Math.random() * 5) + 1, bathrooms: Math.floor(Math.random() * 3) + 1, basePrice: Math.floor(Math.random() * 500) + 100, isActive: true, })) } export function generateLargeFinancialReport() { // Generate report with 1000+ rows const rows = Array.from({ length: 1000 }, (_, i) => ({ date: `2025-${String(Math.floor(i / 30) + 1).padStart(2, '0')}-${String((i % 30) + 1).padStart(2, '0')}`, bookingId: `BK${i}`, revenue: Math.random() * 1000, expenses: Math.random() * 200, profit: Math.random() * 800, notes: `Detailed financial notes for booking ${i} `.repeat(10), // Verbose })) return { startDate: '2025-01-01', endDate: '2025-12-31', totalRevenue: 500000, totalExpenses: 100000, totalProfit: 400000, rows, } } export function generateLargeHtmlError() { // Simulate large 500 error body return ` <!DOCTYPE html> <html> <head><title>500 Internal Server Error</title></head> <body> <h1>500 Internal Server Error</h1> <pre> Traceback (most recent call last): ${' File "server.py", line 123, in handler\n'.repeat(100)} ... [10KB+ of stack trace] ... </pre> </body> </html> ` } ``` **Acceptance Criteria**: - ✅ All Hostaway endpoints can be mocked - ✅ Pagination responses realistic and deterministic - ✅ Error scenarios (500, 429) mockable - ✅ Large payloads generate correct token estimates --- ### Phase 2: Core Contract Tests (Week 2) - 5 days **Goal**: Implement pagination, preview, and error hygiene tests #### Day 6-7: Pagination Contract Tests **Tasks**: - [ ] T005: Test `list_properties` pagination - Limit parameter respected - nextCursor presence when hasMore - Disjoint pages on cursor navigation - Final page has no nextCursor **Deliverables**: ```typescript // test/hostaway.pagination.test.ts import { describe, it, expect, beforeAll, afterAll } from 'vitest' import { MCPTestClient } from './utils/mcpClient' import { setupHostawayMocks } from './fixtures/hostaway/setup' import { assertBelowHardCap } from './utils/tokenEstimator' describe('Pagination Contracts', () => { let client: MCPTestClient beforeAll(async () => { setupHostawayMocks() client = new MCPTestClient({ serverCommand: 'python', serverArgs: ['-m', 'src.mcp.server'], env: { MCP_OUTPUT_TOKEN_THRESHOLD: '2000', MCP_HARD_OUTPUT_TOKEN_CAP: '3000', MCP_DEFAULT_PAGE_SIZE: '5', HOSTAWAY_ACCOUNT_ID: 'test_account', HOSTAWAY_SECRET_KEY: 'test_secret', }, }) await client.start() }) afterAll(async () => { await client.stop() }) it('list_properties respects limit parameter', async () => { const result = await client.callTool('list_properties', { limit: 10, }) expect(result.content).toHaveProperty('items') expect(result.content.items).toBeInstanceOf(Array) expect(result.content.items.length).toBeLessThanOrEqual(10) // Token budget validation assertBelowHardCap(result.content, 3000) }) it('list_properties returns nextCursor when more results available', async () => { const result = await client.callTool('list_properties', { limit: 5, }) expect(result.content).toHaveProperty('nextCursor') expect(typeof result.content.nextCursor).toBe('string') expect(result.content.meta.hasMore).toBe(true) }) it('list_properties pagination returns disjoint pages', async () => { // First page const page1 = await client.callTool('list_properties', { limit: 5, }) // Second page using cursor const page2 = await client.callTool('list_properties', { limit: 5, cursor: page1.content.nextCursor, }) // Collect IDs from both pages const ids1 = page1.content.items.map((item) => item.id) const ids2 = page2.content.items.map((item) => item.id) // Assert no overlap (disjoint sets) const overlap = ids1.filter((id) => ids2.includes(id)) expect(overlap).toHaveLength(0) // Both pages should be under token cap assertBelowHardCap(page1.content, 3000) assertBelowHardCap(page2.content, 3000) }) it('list_properties final page has no nextCursor', async () => { let cursor = null let pageCount = 0 const maxPages = 200 // Safety limit // Paginate to end while (pageCount < maxPages) { const result = await client.callTool('list_properties', { limit: 5, ...(cursor && { cursor }), }) pageCount++ if (!result.content.meta.hasMore) { // Final page should have no nextCursor expect(result.content.nextCursor).toBeNull() break } cursor = result.content.nextCursor } expect(pageCount).toBeLessThan(maxPages) // Should reach end }) it('search_bookings respects pagination', async () => { const result = await client.callTool('search_bookings', { status: 'confirmed', limit: 10, }) expect(result.content.items.length).toBeLessThanOrEqual(10) assertBelowHardCap(result.content, 3000) }) }) ``` **Acceptance Criteria**: - ✅ All pagination tests pass - ✅ No token cap violations - ✅ Deterministic results (rerunnable) --- #### Day 7-8: Preview Mode & Token Cap Tests **Tasks**: - [ ] T006: Test preview mode activation - Financial reports trigger preview for large date ranges - Preview includes summary + cursor for drilldown - Token threshold enforcement **Deliverables**: ```typescript // test/hostaway.preview-and-caps.test.ts import { describe, it, expect, beforeAll, afterAll } from 'vitest' import { MCPTestClient } from './utils/mcpClient' import { setupHostawayMocks, setupLargeFinancialReportMock, } from './fixtures/hostaway/setup' import { estimateTokensFromObject, assertBelowHardCap, checkTokenBudget, } from './utils/tokenEstimator' describe('Preview Mode & Token Caps', () => { let client: MCPTestClient beforeAll(async () => { setupHostawayMocks() setupLargeFinancialReportMock() client = new MCPTestClient({ serverCommand: 'python', serverArgs: ['-m', 'src.mcp.server'], env: { MCP_OUTPUT_TOKEN_THRESHOLD: '2000', // Very low to force preview MCP_HARD_OUTPUT_TOKEN_CAP: '3000', MCP_DEFAULT_PAGE_SIZE: '5', HOSTAWAY_ACCOUNT_ID: 'test_account', HOSTAWAY_SECRET_KEY: 'test_secret', }, }) await client.start() }) afterAll(async () => { await client.stop() }) it('get_financial_reports triggers preview for large date range', async () => { const result = await client.callTool('get_financial_reports', { startDate: '2025-01-01', endDate: '2025-12-31', // Full year - large range }) // Should activate preview mode expect(result.content).toHaveProperty('meta') expect(result.content.meta).toHaveProperty('summary') expect(result.content.meta.summary.kind).toBe('preview') // Should include summary stats expect(result.content.meta.summary).toHaveProperty('totalApprox') expect(result.content.meta.summary).toHaveProperty('notes') // Should provide drilldown cursor expect(result.content).toHaveProperty('nextCursor') // Must not exceed hard cap assertBelowHardCap(result.content, 3000) // Should be under soft threshold (with preview) const tokens = estimateTokensFromObject(result.content) expect(tokens).toBeLessThan(2000) }) it('get_property_details respects field projection', async () => { const result = await client.callTool('get_property_details', { propertyId: 12345, // No fields specified - should return default projection }) // Should have essential fields expect(result.content).toHaveProperty('id') expect(result.content).toHaveProperty('name') expect(result.content).toHaveProperty('address') // Large nested fields should be chunked or omitted if (result.content.reviews) { // Reviews should be summarized or paginated expect(result.content.reviews).toHaveProperty('meta') expect(result.content.reviews.meta.kind).toBe('preview') } assertBelowHardCap(result.content, 3000) }) it('never exceeds hard token cap under any scenario', async () => { const scenarios = [ { tool: 'list_properties', args: { limit: 100 } }, { tool: 'search_bookings', args: { status: 'all', limit: 100 } }, { tool: 'get_financial_reports', args: { startDate: '2020-01-01', endDate: '2025-12-31' } }, ] for (const { tool, args } of scenarios) { const result = await client.callTool(tool, args) // CRITICAL: Never exceed hard cap const tokens = estimateTokensFromObject(result.content) expect(tokens).toBeLessThanOrEqual(3000) } }) it('soft threshold triggers preview without data loss', async () => { const result = await client.callTool('get_financial_reports', { startDate: '2025-01-01', endDate: '2025-03-31', // Q1 - moderate size }) const { estimated, exceeds } = checkTokenBudget( JSON.stringify(result.content), 2000 ) if (exceeds) { // Should have preview metadata expect(result.content.meta.summary.kind).toBe('preview') // Should provide way to get full data expect( result.content.nextCursor || result.content.meta.detailsAvailable ).toBeTruthy() } else { // Under threshold - full data returned expect(result.content.meta?.summary?.kind).not.toBe('preview') } }) }) ``` **Acceptance Criteria**: - ✅ Preview mode activates when exceeding soft threshold - ✅ Hard cap never violated - ✅ Preview includes actionable next steps - ✅ Field projection reduces payload size --- #### Day 8-9: Error Hygiene Tests **Tasks**: - [ ] T007: Test error response compactness - 500 errors return compact JSON - 429 errors include retry metadata - No large HTML bodies surfaced **Deliverables**: ```typescript // test/hostaway.errors-and-rate-limit.test.ts import { describe, it, expect, beforeAll, afterAll } from 'vitest' import { MCPTestClient } from './utils/mcpClient' import { setupErrorMocks } from './fixtures/hostaway/setup' import { estimateTokensFromObject } from './utils/tokenEstimator' describe('Error Hygiene & Rate Limiting', () => { let client: MCPTestClient beforeAll(async () => { setupErrorMocks() client = new MCPTestClient({ serverCommand: 'python', serverArgs: ['-m', 'src.mcp.server'], env: { HOSTAWAY_ACCOUNT_ID: 'test_account', HOSTAWAY_SECRET_KEY: 'test_secret', }, }) await client.start() }) afterAll(async () => { await client.stop() }) it('500 error returns compact JSON (no HTML)', async () => { try { await client.callTool('trigger_500_error', {}) } catch (error) { // Should receive structured error expect(error).toHaveProperty('code') expect(error).toHaveProperty('message') expect(error).toHaveProperty('correlationId') // Should not include HTML const errorJson = JSON.stringify(error) expect(errorJson).not.toContain('<!DOCTYPE') expect(errorJson).not.toContain('<html>') // Should be compact (<2KB) const tokens = estimateTokensFromObject(error) expect(tokens).toBeLessThan(500) // ~2KB / 4 bytes/token } }) it('429 error includes retry guidance', async () => { try { await client.callTool('trigger_429_error', {}) } catch (error) { expect(error).toHaveProperty('code') expect(error.code).toBe('rate_limit_exceeded') // Should include retry metadata expect(error).toHaveProperty('retryAfterMs') expect(typeof error.retryAfterMs).toBe('number') expect(error.retryAfterMs).toBeGreaterThan(0) // Optional: rate limit remaining if (error.rateLimitRemaining !== undefined) { expect(error.rateLimitRemaining).toBe(0) } // Should be compact const tokens = estimateTokensFromObject(error) expect(tokens).toBeLessThan(500) } }) it('all error responses are compact', async () => { const errorScenarios = [ 'trigger_500_error', 'trigger_429_error', 'trigger_400_error', 'trigger_404_error', ] for (const tool of errorScenarios) { try { await client.callTool(tool, {}) } catch (error) { // All errors should be <2KB const tokens = estimateTokensFromObject(error) expect(tokens).toBeLessThan(500) // Should have standard fields expect(error).toHaveProperty('code') expect(error).toHaveProperty('message') } } }) it('error messages are actionable', async () => { try { await client.callTool('trigger_429_error', {}) } catch (error) { // Message should guide user expect(error.message).toMatch(/retry|wait|limit/i) // Should include timing information expect(error.retryAfterMs || error.message).toBeTruthy() } }) }) ``` **Acceptance Criteria**: - ✅ All error responses <2KB - ✅ No HTML in error bodies - ✅ 429 errors include retry timing - ✅ Errors have correlation IDs --- #### Day 9-10: Resources & Field Projection Tests **Tasks**: - [ ] T008: Test MCP resources (if exposed) - resources/list implements cursor pagination - resources/read respects size bounds - [ ] T009: Test field projection on detail endpoints - get_booking_details chunking - get_guest_info projection **Deliverables**: ```typescript // test/hostaway.resources.test.ts import { describe, it, expect, beforeAll, afterAll } from 'vitest' import { MCPTestClient } from './utils/mcpClient' import { setupHostawayMocks } from './fixtures/hostaway/setup' import { assertBelowHardCap } from './utils/tokenEstimator' describe('MCP Resources Pagination', () => { let client: MCPTestClient beforeAll(async () => { setupHostawayMocks() client = new MCPTestClient({ serverCommand: 'python', serverArgs: ['-m', 'src.mcp.server'], env: { MCP_OUTPUT_TOKEN_THRESHOLD: '2000', MCP_HARD_OUTPUT_TOKEN_CAP: '3000', HOSTAWAY_ACCOUNT_ID: 'test_account', HOSTAWAY_SECRET_KEY: 'test_secret', }, }) await client.start() }) afterAll(async () => { await client.stop() }) it('resources/list implements cursor pagination', async () => { const resources = await client.listResources() // If resources exposed, should be paginated if (resources.nextCursor) { expect(resources.resources).toBeInstanceOf(Array) expect(resources.resources.length).toBeGreaterThan(0) // Follow cursor const nextPage = await client.listResources({ cursor: resources.nextCursor, }) // Should get different resources const firstIds = resources.resources.map((r) => r.uri) const nextIds = nextPage.resources.map((r) => r.uri) const overlap = firstIds.filter((id) => nextIds.includes(id)) expect(overlap).toHaveLength(0) } }) it('resources/read respects size bounds', async () => { const resources = await client.listResources() if (resources.resources.length > 0) { const firstResource = resources.resources[0] const content = await client.readResource(firstResource.uri) // Should not exceed hard cap assertBelowHardCap(content, 3000) } }) }) ``` ```typescript // test/hostaway.field-projection.test.ts import { describe, it, expect, beforeAll, afterAll } from 'vitest' import { MCPTestClient } from './utils/mcpClient' import { setupHostawayMocks } from './fixtures/hostaway/setup' import { estimateTokensFromObject } from './utils/tokenEstimator' describe('Field Projection & Chunking', () => { let client: MCPTestClient beforeAll(async () => { setupHostawayMocks() client = new MCPTestClient({ serverCommand: 'python', serverArgs: ['-m', 'src.mcp.server'], env: { MCP_OUTPUT_TOKEN_THRESHOLD: '2000', MCP_HARD_OUTPUT_TOKEN_CAP: '3000', HOSTAWAY_ACCOUNT_ID: 'test_account', HOSTAWAY_SECRET_KEY: 'test_secret', }, }) await client.start() }) afterAll(async () => { await client.stop() }) it('get_booking_details chunks large nested sections', async () => { const result = await client.callTool('get_booking_details', { bookingId: 'BK12345', }) // Core booking data should be present expect(result.content).toHaveProperty('id') expect(result.content).toHaveProperty('status') // Large collections should be chunked if (result.content.lineItems) { // Should have preview or pagination expect( result.content.lineItems.meta || result.content.lineItems.nextCursor ).toBeTruthy() } // Should not exceed hard cap const tokens = estimateTokensFromObject(result.content) expect(tokens).toBeLessThanOrEqual(3000) }) it('get_guest_info returns minimal data by default', async () => { const result = await client.callTool('get_guest_info', { guestId: 'G12345', }) // Should have essential fields expect(result.content).toHaveProperty('id') expect(result.content).toHaveProperty('firstName') expect(result.content).toHaveProperty('lastName') expect(result.content).toHaveProperty('email') // Large collections (history, reviews) should be excluded by default expect(result.content.bookingHistory).toBeUndefined() expect(result.content.reviewHistory).toBeUndefined() // Should be compact const tokens = estimateTokensFromObject(result.content) expect(tokens).toBeLessThan(500) }) it('get_guest_info with includeHistory returns paginated history', async () => { const result = await client.callTool('get_guest_info', { guestId: 'G12345', includeHistory: true, }) // Should have history but paginated if (result.content.bookingHistory) { expect(result.content.bookingHistory).toHaveProperty('items') expect(result.content.bookingHistory.items.length).toBeLessThanOrEqual( 10 ) // Should have pagination metadata expect(result.content.bookingHistory).toHaveProperty('meta') } // Still should not exceed hard cap const tokens = estimateTokensFromObject(result.content) expect(tokens).toBeLessThanOrEqual(3000) }) }) ``` **Acceptance Criteria**: - ✅ Resources pagination works (if implemented) - ✅ Field projection reduces payload size - ✅ Chunking prevents oversized nested data - ✅ Optional fields behind explicit parameters --- ### Phase 3: Integration & Performance (Week 3) - 5 days **Goal**: End-to-end tests, performance validation, live integration (optional) #### Day 11-12: End-to-End Scenarios **Tasks**: - [ ] T010: Implement E2E workflow tests - Complete user journeys - Multi-tool orchestration - Token budget across multiple calls **Deliverables**: ```typescript // test/e2e/property-search-flow.test.ts import { describe, it, expect, beforeAll, afterAll } from 'vitest' import { MCPTestClient } from '../utils/mcpClient' import { setupHostawayMocks } from '../fixtures/hostaway/setup' import { estimateTokensFromObject } from '../utils/tokenEstimator' describe('E2E: Property Search & Booking Flow', () => { let client: MCPTestClient let totalTokens = 0 beforeAll(async () => { setupHostawayMocks() client = new MCPTestClient({ serverCommand: 'python', serverArgs: ['-m', 'src.mcp.server'], env: { MCP_OUTPUT_TOKEN_THRESHOLD: '5000', MCP_HARD_OUTPUT_TOKEN_CAP: '10000', MCP_DEFAULT_PAGE_SIZE: '20', HOSTAWAY_ACCOUNT_ID: 'test_account', HOSTAWAY_SECRET_KEY: 'test_secret', }, }) await client.start() }) afterAll(async () => { await client.stop() }) it('user searches properties, checks availability, creates booking', async () => { // Step 1: Search properties in San Francisco const properties = await client.callTool('list_properties', { city: 'San Francisco', limit: 10, }) totalTokens += estimateTokensFromObject(properties.content) expect(properties.content.items.length).toBeGreaterThan(0) // Step 2: Get details for first property const propertyId = properties.content.items[0].id const details = await client.callTool('get_property_details', { propertyId, }) totalTokens += estimateTokensFromObject(details.content) // Step 3: Check availability const availability = await client.callTool('check_availability', { propertyId, startDate: '2025-11-01', endDate: '2025-11-05', }) totalTokens += estimateTokensFromObject(availability.content) // Step 4: Create booking (mock) const booking = await client.callTool('create_booking', { propertyId, guestName: 'John Doe', checkIn: '2025-11-01', checkOut: '2025-11-05', }) totalTokens += estimateTokensFromObject(booking.content) // Assert: Total conversation tokens within budget expect(totalTokens).toBeLessThan(10000) console.log(`Total tokens used in flow: ${totalTokens}`) }) it('pagination workflow stays within token budget', async () => { let pageCount = 0 let cursor = null let totalTokens = 0 // Paginate through 10 pages while (pageCount < 10) { const result = await client.callTool('list_properties', { limit: 5, ...(cursor && { cursor }), }) totalTokens += estimateTokensFromObject(result.content) pageCount++ if (!result.content.meta.hasMore) break cursor = result.content.nextCursor } // 10 pages * ~2KB per page = ~20KB total expect(totalTokens).toBeLessThan(10000) console.log(`Paginated ${pageCount} pages using ${totalTokens} tokens`) }) }) ``` **Acceptance Criteria**: - ✅ Multi-step workflows complete successfully - ✅ Cumulative token usage tracked - ✅ Conversation stays within budget --- #### Day 12-13: Performance Benchmarks **Tasks**: - [ ] T011: Implement performance tests - Token estimation performance - Pagination overhead - Preview generation latency **Deliverables**: ```typescript // test/performance/token-estimation.bench.ts import { describe, bench } from 'vitest' import { estimateTokens, estimateTokensFromObject } from '../utils/tokenEstimator' import { generateProperties } from '../fixtures/generators/properties' describe('Token Estimation Performance', () => { const smallText = 'Hello world'.repeat(10) // ~100 chars const mediumText = 'x'.repeat(1000) // 1KB const largeText = 'x'.repeat(100000) // 100KB bench('estimate tokens from 100 chars', () => { estimateTokens(smallText) }) bench('estimate tokens from 1KB', () => { estimateTokens(mediumText) }) bench('estimate tokens from 100KB', () => { estimateTokens(largeText) }) const smallObject = { id: 1, name: 'Test' } const mediumObject = generateProperties(10)[0] const largeObject = { properties: generateProperties(100) } bench('estimate tokens from small object', () => { estimateTokensFromObject(smallObject) }) bench('estimate tokens from medium object', () => { estimateTokensFromObject(mediumObject) }) bench('estimate tokens from large object', () => { estimateTokensFromObject(largeObject) }) }) ``` ```typescript // test/performance/pagination.bench.ts import { describe, bench, beforeAll, afterAll } from 'vitest' import { MCPTestClient } from '../utils/mcpClient' import { setupHostawayMocks } from '../fixtures/hostaway/setup' describe('Pagination Performance', () => { let client: MCPTestClient beforeAll(async () => { setupHostawayMocks() client = new MCPTestClient({ serverCommand: 'python', serverArgs: ['-m', 'src.mcp.server'], env: { HOSTAWAY_ACCOUNT_ID: 'test_account', HOSTAWAY_SECRET_KEY: 'test_secret', }, }) await client.start() }) afterAll(async () => { await client.stop() }) bench('first page retrieval', async () => { await client.callTool('list_properties', { limit: 10 }) }) bench('cursor navigation', async () => { const page1 = await client.callTool('list_properties', { limit: 10 }) await client.callTool('list_properties', { limit: 10, cursor: page1.content.nextCursor, }) }) bench('paginate through 10 pages', async () => { let cursor = null for (let i = 0; i < 10; i++) { const result = await client.callTool('list_properties', { limit: 5, ...(cursor && { cursor }), }) cursor = result.content.nextCursor if (!cursor) break } }) }) ``` **Performance Targets**: - Token estimation: <1ms for typical responses - First page: <100ms - Cursor navigation: <150ms - 10-page pagination: <2 seconds **Acceptance Criteria**: - ✅ All benchmarks meet performance targets - ✅ No performance regressions from baseline --- #### Day 13-14: Live Integration Tests (Optional) **Tasks**: - [ ] T012: Implement live integration tests - Run against Hostaway sandbox - Validate token estimates against real data - Nightly CI job only **Deliverables**: ```typescript // test/live/integration.test.ts import { describe, it, expect, beforeAll, afterAll } from 'vitest' import { MCPTestClient } from '../utils/mcpClient' describe.skipIf(!process.env.RUN_LIVE_TESTS)('Live Integration Tests', () => { let client: MCPTestClient beforeAll(async () => { // No mocks - use real Hostaway API client = new MCPTestClient({ serverCommand: 'python', serverArgs: ['-m', 'src.mcp.server'], env: { MCP_OUTPUT_TOKEN_THRESHOLD: '50000', // Production caps MCP_HARD_OUTPUT_TOKEN_CAP: '100000', MCP_DEFAULT_PAGE_SIZE: '50', HOSTAWAY_ACCOUNT_ID: process.env.HOSTAWAY_ACCOUNT_ID, HOSTAWAY_SECRET_KEY: process.env.HOSTAWAY_SECRET_KEY, }, }) await client.start() }) afterAll(async () => { await client.stop() }) it('pagination works with real API', async () => { const result = await client.callTool('list_properties', { limit: 10, }) expect(result.content.items).toBeInstanceOf(Array) expect(result.estimatedTokens).toBeLessThan(100000) }) it('token estimates are accurate within 20%', async () => { const result = await client.callTool('get_property_details', { propertyId: process.env.TEST_PROPERTY_ID, }) // Compare estimated vs actual (if we had actual tokenizer) // For now, just validate estimate is reasonable const contentSize = JSON.stringify(result.content).length const expectedTokens = Math.ceil((contentSize / 4) * 1.2) const variance = Math.abs(result.estimatedTokens - expectedTokens) / expectedTokens expect(variance).toBeLessThan(0.2) // Within 20% }) }) ``` **CI Configuration**: ```yaml # .github/workflows/overflow-safety-nightly.yml name: Overflow Safety - Nightly Live Tests on: schedule: - cron: '0 2 * * *' # 2 AM daily workflow_dispatch: jobs: live-integration: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: '20' - uses: actions/setup-python@v5 with: python-version: '3.12' - name: Install dependencies run: | cd test npm install pip install -e .. - name: Run live integration tests run: | cd test npm test -- --run test/live/ env: RUN_LIVE_TESTS: 'true' HOSTAWAY_ACCOUNT_ID: ${{ secrets.HOSTAWAY_SANDBOX_ACCOUNT_ID }} HOSTAWAY_SECRET_KEY: ${{ secrets.HOSTAWAY_SANDBOX_SECRET_KEY }} TEST_PROPERTY_ID: ${{ secrets.TEST_PROPERTY_ID }} - name: Publish metrics if: always() uses: actions/upload-artifact@v4 with: name: live-test-metrics path: test/metrics/ ``` **Acceptance Criteria**: - ✅ Live tests run successfully against sandbox - ✅ Token estimates validated against real data - ✅ Results published for monitoring --- ### Phase 4: CI Integration & Documentation (Week 4) - 5 days **Goal**: Production-ready CI pipeline and comprehensive documentation #### Day 16-17: GitHub Actions Workflow **Tasks**: - [ ] T013: Create PR gate workflow - Run on all PRs - Enforce token caps - Report failures clearly **Deliverables**: ```yaml # .github/workflows/overflow-safety-pr.yml name: Overflow Safety Tests - PR Gate on: pull_request: branches: [main, develop] push: branches: [main, develop] jobs: overflow-safety: name: Token Cap & Contract Validation runs-on: ubuntu-latest timeout-minutes: 10 steps: - name: Checkout code uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20' cache: 'npm' cache-dependency-path: test/package-lock.json - name: Setup Python uses: actions/setup-python@v5 with: python-version: '3.12' cache: 'pip' - name: Install Python dependencies run: | pip install -e . - name: Install test dependencies working-directory: test run: | npm ci - name: Run overflow-safety tests (strict caps) working-directory: test run: | npm test -- \ --reporter=verbose \ --reporter=junit \ --outputFile=junit-results.xml env: MCP_OUTPUT_TOKEN_THRESHOLD: '1000' MCP_HARD_OUTPUT_TOKEN_CAP: '5000' MCP_DEFAULT_PAGE_SIZE: '5' HOSTAWAY_ACCOUNT_ID: 'test_account' HOSTAWAY_SECRET_KEY: 'test_secret' - name: Upload test results if: always() uses: actions/upload-artifact@v4 with: name: overflow-safety-test-results path: test/junit-results.xml - name: Comment PR with results if: failure() && github.event_name == 'pull_request' uses: actions/github-script@v7 with: script: | const fs = require('fs') const results = fs.readFileSync('test/junit-results.xml', 'utf8') // Parse results and create comment github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: `## 🔒 Overflow-Safety Tests Failed\n\n` + `One or more endpoints exceeded token caps or violated contracts.\n\n` + `See test results artifact for details.` }) - name: Check for flakes if: failure() working-directory: test run: | echo "Rerunning failed tests to check for flakes..." npm test -- --rerun-failures=3 typecheck: name: TypeScript Type Check runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: '20' - working-directory: test run: | npm ci npm run typecheck ``` **Acceptance Criteria**: - ✅ CI runs on every PR - ✅ Clear failure messages - ✅ Flake detection implemented - ✅ Results uploaded as artifacts --- #### Day 17-18: Test Documentation **Tasks**: - [ ] T014: Write test documentation - How to run tests locally - How to add new tests - Troubleshooting guide **Deliverables**: ```markdown # test/README.md # Overflow-Safety Test Suite Automated tests validating token cap enforcement, pagination contracts, and error hygiene for the Hostaway MCP server. ## Quick Start ### Prerequisites - Node.js 20+ - Python 3.12+ - MCP server installed (`pip install -e ..`) ### Installation ```bash cd test npm install ``` ### Running Tests ```bash # Run all tests with default caps npm test # Run with strict caps (force preview mode) export MCP_OUTPUT_TOKEN_THRESHOLD=1000 export MCP_HARD_OUTPUT_TOKEN_CAP=5000 npm test # Run specific test file npm test -- hostaway.pagination.test.ts # Run with coverage npm test -- --coverage # Run performance benchmarks npm run bench ``` ## Test Categories ### Pagination Tests (`hostaway.pagination.test.ts`) - Validates `limit` parameter enforcement - Tests `nextCursor` presence and navigation - Ensures disjoint pages - Verifies final page behavior ### Preview & Token Cap Tests (`hostaway.preview-and-caps.test.ts`) - Validates soft threshold triggers preview mode - Ensures hard cap never exceeded - Tests field projection - Validates preview metadata ### Error Hygiene Tests (`hostaway.errors-and-rate-limit.test.ts`) - Validates error responses <2KB - Ensures no HTML in errors - Tests 429 retry metadata - Validates correlation IDs ### E2E Tests (`e2e/`) - Multi-tool workflows - Token budget tracking across conversations - Real-world user journeys ## Configuration ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `MCP_OUTPUT_TOKEN_THRESHOLD` | 50000 | Soft cap (triggers preview) | | `MCP_HARD_OUTPUT_TOKEN_CAP` | 100000 | Hard cap (never exceeded) | | `MCP_DEFAULT_PAGE_SIZE` | 50 | Default pagination size | | `MCP_MAX_PAGE_SIZE` | 200 | Maximum page size | | `HOSTAWAY_ACCOUNT_ID` | - | Hostaway account (mocked in tests) | | `HOSTAWAY_SECRET_KEY` | - | Hostaway secret (mocked in tests) | ### Force Small Caps (for testing overflow paths) ```bash export MCP_OUTPUT_TOKEN_THRESHOLD=1000 export MCP_HARD_OUTPUT_TOKEN_CAP=5000 export MCP_DEFAULT_PAGE_SIZE=5 npm test ``` ## Adding New Tests ### 1. Create Test File ```typescript // test/hostaway.new-feature.test.ts import { describe, it, expect, beforeAll, afterAll } from 'vitest' import { MCPTestClient } from './utils/mcpClient' import { setupHostawayMocks } from './fixtures/hostaway/setup' import { assertBelowHardCap } from './utils/tokenEstimator' describe('New Feature Tests', () => { let client: MCPTestClient beforeAll(async () => { setupHostawayMocks() client = new MCPTestClient({ serverCommand: 'python', serverArgs: ['-m', 'src.mcp.server'], env: { MCP_OUTPUT_TOKEN_THRESHOLD: '2000', MCP_HARD_OUTPUT_TOKEN_CAP: '3000', HOSTAWAY_ACCOUNT_ID: 'test_account', HOSTAWAY_SECRET_KEY: 'test_secret', }, }) await client.start() }) afterAll(async () => { await client.stop() }) it('validates new endpoint behavior', async () => { const result = await client.callTool('new_endpoint', { param: 'value' }) // Assert token cap assertBelowHardCap(result.content, 3000) // Assert other contracts expect(result.content).toHaveProperty('expectedField') }) }) ``` ### 2. Add Mocks ```typescript // test/fixtures/hostaway/setup.ts export function setupNewEndpointMock() { nock(HOSTAWAY_BASE_URL) .get('/v1/new-endpoint') .reply(200, { // Mock response }) } ``` ### 3. Run Tests ```bash npm test -- hostaway.new-feature.test.ts ``` ## Troubleshooting ### "MCP server failed to start" **Cause**: Server not installed or PATH issue **Fix**: ```bash pip install -e .. which python # Should be in virtualenv ``` ### "Tests timeout" **Cause**: Server not responding or infinite loop **Fix**: - Check server logs in test output - Reduce timeout: `npm test -- --timeout=10000` - Enable verbose logging: `DEBUG=* npm test` ### "Token estimate mismatch" **Cause**: Response larger than expected **Fix**: - Check if preview mode should activate - Verify pagination is working - Adjust threshold: `export MCP_OUTPUT_TOKEN_THRESHOLD=5000` ### "Flaky tests" **Cause**: Timing issues or mock state **Fix**: - Add `beforeEach(() => nock.cleanAll())` - Increase timeouts - Check for async race conditions ## CI Integration Tests run automatically on every PR via GitHub Actions. ### PR Gate Workflow - Runs with strict caps (threshold=1000, hardCap=5000) - Must pass before merge - Results uploaded as artifacts ### Nightly Workflow - Runs live integration tests (optional) - Uses real Hostaway sandbox - Publishes metrics ## Performance Targets | Metric | Target | |--------|--------| | Test suite duration | <5 minutes | | Individual test | <10 seconds | | Token estimation | <1ms | | First page load | <100ms | | Flake rate | <1% | ## Metrics & Monitoring Performance metrics saved to `test/metrics/`: - Token usage per endpoint - Response times - Flake detection results View metrics: ```bash cat test/metrics/latest.json | jq ``` ## References - [Vitest Documentation](https://vitest.dev/) - [MCP SDK Documentation](https://spec.modelcontextprotocol.io/) - [Nock HTTP Mocking](https://github.com/nock/nock) - [Overflow-Safety Spec](../specs/006-overflow-safety-test-suite/spec.md) ``` **Acceptance Criteria**: - ✅ Documentation clear and comprehensive - ✅ Examples work out of the box - ✅ Troubleshooting covers common issues --- #### Day 18-19: Runbook & Metrics **Tasks**: - [ ] T015: Create failure runbook - Common failure modes - Debugging steps - Fix recipes **Deliverables**: ```markdown # test/RUNBOOK.md # Overflow-Safety Test Failure Runbook ## Quick Diagnosis ### Identify Failure Type ```bash # Check test output cat test-results/junit-results.xml | grep "<failure" # Common patterns: # - "Token budget exceeded" → Hard cap violation # - "Preview mode not activated" → Soft threshold not working # - "nextCursor missing" → Pagination broken # - "Error response too large" → Error hygiene failure ``` ## Failure Modes & Fixes ### 1. Hard Cap Violation **Symptom**: `HARD CAP VIOLATION: ${tokens} tokens > ${cap} hard cap` **Cause**: Endpoint returning too much data **Diagnosis**: ```bash # Check which endpoint failed grep "HARD CAP VIOLATION" test-results/junit-results.xml # Get full response DEBUG=mcp:* npm test -- <failing-test> ``` **Fix Options**: A. **Add Pagination** ```python # In endpoint handler from src.services.pagination_service import get_pagination_service pagination = get_pagination_service(secret="...") response = pagination.build_response( items=filtered_items, total_count=total, params=params ) ``` B. **Enable Preview Mode** ```python # Check if should summarize from src.services.summarization_service import get_summarization_service summarization = get_summarization_service() should_summarize, tokens = summarization.should_summarize( obj=large_object, token_threshold=config.MCP_OUTPUT_TOKEN_THRESHOLD ) if should_summarize: return summarization.summarize_object( obj=large_object, obj_type="booking", endpoint="/api/v1/bookings/{id}" ) ``` C. **Increase Hard Cap** (last resort) ```bash # Update config export MCP_HARD_OUTPUT_TOKEN_CAP=150000 ``` **Verification**: ```bash # Re-run test MCP_HARD_OUTPUT_TOKEN_CAP=150000 npm test -- <failing-test> ``` --- ### 2. Preview Mode Not Activating **Symptom**: Test expects preview but gets full data **Cause**: Token estimation incorrect or threshold too high **Diagnosis**: ```bash # Check estimated tokens grep "estimated" test-results/junit-results.xml # Compare to threshold echo $MCP_OUTPUT_TOKEN_THRESHOLD ``` **Fix Options**: A. **Adjust Threshold** ```bash # Lower threshold to force preview export MCP_OUTPUT_TOKEN_THRESHOLD=2000 npm test ``` B. **Fix Token Estimation** ```python # In handler, add debug logging from src.utils.token_estimator import estimate_tokens_from_dict estimated = estimate_tokens_from_dict(response) logger.info(f"Response tokens: {estimated}, threshold: {threshold}") ``` C. **Verify Preview Logic** ```python # Ensure handler checks threshold if estimated_tokens > config.MCP_OUTPUT_TOKEN_THRESHOLD: # Activate preview mode return preview_response ``` **Verification**: ```bash npm test -- --grep "preview" ``` --- ### 3. Pagination Broken **Symptom**: `nextCursor missing when hasMore=true` or overlapping pages **Diagnosis**: ```bash # Check pagination metadata npm test -- --grep "pagination" --reporter=verbose ``` **Fix Options**: A. **Verify Cursor Creation** ```python # Ensure nextCursor created when hasMore if next_offset < total_count: next_cursor = pagination.create_cursor( offset=next_offset, order_by=order_by, filters=filters ) else: next_cursor = None ``` B. **Check Offset Calculation** ```python # Ensure disjoint pages current_offset = cursor_data["offset"] if cursor else 0 next_offset = current_offset + len(items) # Not current + limit! ``` C. **Validate hasMore Logic** ```python has_more = next_offset < total_count # Should match nextCursor presence ``` **Verification**: ```bash # Test multiple pages npm test -- hostaway.pagination.test.ts ``` --- ### 4. Error Response Too Large **Symptom**: Error exceeds 2KB token limit **Diagnosis**: ```bash # Check error size npm test -- --grep "error" --reporter=verbose ``` **Fix Options**: A. **Strip HTML from Errors** ```python # In error handler try: response = await hostaway_client.get(endpoint) except httpx.HTTPStatusError as e: # Don't return raw HTML if 'text/html' in e.response.headers.get('content-type', ''): # Return compact error instead return { "error": { "code": "internal_server_error", "message": "Hostaway API error", "correlationId": correlation_id, "statusCode": e.response.status_code } } ``` B. **Add Correlation IDs** ```python from src.lib.utils.errors import logError, createErrorResponse correlation_id = logError("Hostaway API failed", error, context) return createErrorResponse( ErrorMessages.HOSTAWAY_CONNECTION_FAILED, correlation_id ) ``` **Verification**: ```bash npm test -- hostaway.errors-and-rate-limit.test.ts ``` --- ## Performance Issues ### Tests Taking >5 Minutes **Diagnosis**: ```bash # Run with durations report npm test -- --reporter=verbose --outputFile=durations.txt grep "Duration" durations.txt | sort -nr | head -10 ``` **Fix Options**: A. **Run Tests in Parallel** ```bash # Increase concurrency npm test -- --pool=threads --poolOptions.threads.maxThreads=8 ``` B. **Optimize Slow Tests** ```typescript // Reduce iterations it('pagination test', async () => { for (let i = 0; i < 10; i++) { // Was 100 // ... } }) ``` C. **Skip Slow Tests in PR Gate** ```typescript it.skip('expensive test', async () => { // Only run nightly }) ``` --- ## Flaky Tests ### Intermittent Failures **Diagnosis**: ```bash # Re-run failed tests 10 times npm test -- --rerun-failures=10 ``` **Fix Options**: A. **Add Retries** ```typescript it('flaky test', async () => { await retry(async () => { const result = await client.callTool(...) expect(result).toBeTruthy() }, { retries: 3 }) }) ``` B. **Fix Timing Issues** ```typescript // Add explicit wait await new Promise(resolve => setTimeout(resolve, 100)) ``` C. **Clean Mocks Between Tests** ```typescript beforeEach(() => { nock.cleanAll() setupHostawayMocks() }) ``` --- ## Escalation If runbook doesn't resolve issue: 1. **Collect Diagnostics**: ```bash # Full verbose output DEBUG=* npm test -- <failing-test> > debug.log 2>&1 # Server logs cat ~/.mcp/logs/server.log # Test environment npm test -- --version node --version python --version ``` 2. **Create GitHub Issue**: - Title: `[Overflow-Safety] Test failure: <test-name>` - Include: debug.log, environment info, steps to reproduce - Label: `overflow-safety`, `test-failure` 3. **Contact Team**: - Slack: #mcp-testing - Email: mcp-team@example.com --- ## Preventive Measures ### Before Deploying New Endpoint ```bash # Run overflow tests with strict caps export MCP_OUTPUT_TOKEN_THRESHOLD=1000 export MCP_HARD_OUTPUT_TOKEN_CAP=5000 npm test # If fails, add pagination/preview before deploying ``` ### Code Review Checklist - [ ] Endpoint implements pagination if returns >10 items - [ ] Large responses trigger preview mode - [ ] Errors return compact JSON (<2KB) - [ ] Token caps never exceeded - [ ] Tests added to overflow-safety suite ``` **Acceptance Criteria**: - ✅ Runbook covers all common failures - ✅ Clear escalation path - ✅ Prevention checklist provided --- #### Day 19-20: Final Testing & Handoff **Tasks**: - [ ] T016: Full regression testing - [ ] T017: Documentation review - [ ] T018: Handoff to team **Deliverables**: - ✅ All tests passing - ✅ CI pipeline green - ✅ Documentation complete - ✅ Team trained --- ## 3. Project Structure ``` hostaway-mcp/ ├── specs/ │ └── 006-overflow-safety-test-suite/ │ ├── spec.md │ ├── research.md │ ├── plan.md (this file) │ ├── data-model.md │ └── tasks.md ├── test/ │ ├── package.json │ ├── tsconfig.json │ ├── vitest.config.ts │ ├── setup.ts │ ├── README.md │ ├── RUNBOOK.md │ ├── utils/ │ │ ├── mcpClient.ts │ │ ├── tokenEstimator.ts │ │ └── assertions.ts │ ├── fixtures/ │ │ ├── hostaway/ │ │ │ └── setup.ts │ │ └── generators/ │ │ ├── properties.ts │ │ ├── bookings.ts │ │ ├── financial.ts │ │ └── errors.ts │ ├── hostaway.pagination.test.ts │ ├── hostaway.preview-and-caps.test.ts │ ├── hostaway.errors-and-rate-limit.test.ts │ ├── hostaway.field-projection.test.ts │ ├── hostaway.resources.test.ts │ ├── e2e/ │ │ └── property-search-flow.test.ts │ ├── performance/ │ │ ├── token-estimation.bench.ts │ │ └── pagination.bench.ts │ ├── live/ │ │ └── integration.test.ts │ └── metrics/ │ └── latest.json └── .github/ └── workflows/ ├── overflow-safety-pr.yml └── overflow-safety-nightly.yml ``` --- ## 4. Dependencies ### Runtime Dependencies ```json { "name": "overflow-safety-tests", "version": "1.0.0", "type": "module", "devDependencies": { "@modelcontextprotocol/sdk": "^1.0.0", "@types/node": "^20.0.0", "nock": "^14.0.0", "tsx": "^4.0.0", "typescript": "^5.0.0", "vitest": "^1.0.0" }, "scripts": { "test": "vitest run", "test:watch": "vitest watch", "test:coverage": "vitest run --coverage", "bench": "vitest bench", "typecheck": "tsc --noEmit" } } ``` ### Python Dependencies Already installed in main project: - `fastapi-mcp` - MCP server - `pytest` - Unit tests - `httpx` - HTTP client --- ## 5. Success Metrics ### Coverage Targets - ✅ 100% of high-volume endpoints tested - ✅ All pagination paths exercised - ✅ All error scenarios covered - ✅ Token cap boundaries tested ### Performance Targets - ✅ Test suite <5 minutes - ✅ Individual tests <10 seconds - ✅ Flake rate <1% ### Quality Targets - ✅ 0 hard cap violations - ✅ ≥95% pagination/preview under forced caps - ✅ All errors <2KB --- ## 6. Risk Mitigation ### High-Risk Areas 1. **MCP stdio Communication** - Mitigation: Thorough client wrapper testing - Fallback: Mock server if stdio issues 2. **Flaky Tests** - Mitigation: Retry logic + flake detection - Fallback: Quarantine flaky tests 3. **Performance** - Mitigation: Parallel execution + mocking - Fallback: Increase timeout limits ### Contingency Plans - Week 3 buffer for unexpected issues - Optional live tests can be deferred - Performance tests can run separately --- ## 7. Acceptance Criteria Summary - ✅ TypeScript/Vitest test suite operational - ✅ MCP stdio client wrapper functional - ✅ Nock mocking for all Hostaway endpoints - ✅ Pagination contracts validated - ✅ Preview mode enforcement tested - ✅ Token cap never exceeded - ✅ Error hygiene validated (<2KB) - ✅ CI pipeline integrated (PR gate + nightly) - ✅ Documentation complete (README + runbook) - ✅ Performance targets met (<5min suite) - ✅ Flake rate <1% --- ## 8. Next Steps After Implementation ### Follow-Up Work 1. **Load Testing** - k6/Artillery scripts - p95 latency validation - Concurrent request handling 2. **Golden Sample Snapshots** - Capture expected summaries - Detect verbosity regressions - Version control snapshots 3. **Metrics Dashboard** - Grafana/Datadog integration - Token usage trends - Preview mode activation rates 4. **Production Rollout** - Enable stricter caps gradually - Monitor real token usage - A/B test preview mode --- **Plan Status**: Ready for Implementation **Estimated Duration**: 4 weeks (20 days) **Team Size**: 1-2 developers **Next Document**: `tasks.md` (sprint-ready task breakdown)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/darrentmorgan/hostaway-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server