Skill Retriever

skill-retriever
.planning
phases
07-integration-validation

07-03-PLAN.md•14 KiB

--- phase: 07-integration-validation plan: 03 type: execute wave: 2 depends_on: ["07-01"] files_modified: - pyproject.toml - tests/validation/test_performance.py - tests/validation/test_mcp_integration.py autonomous: true must_haves: truths: - "MCP server starts in under 3 seconds" - "10 sequential queries complete without performance degradation" - "All 5 MCP tools respond correctly in integration context" artifacts: - path: "tests/validation/test_performance.py" provides: "Startup and latency benchmarks" contains: "benchmark" - path: "tests/validation/test_mcp_integration.py" provides: "End-to-end MCP tool tests" contains: "mcp_client" key_links: - from: "tests/validation/test_performance.py" to: "pytest-benchmark" via: "benchmark fixture" pattern: "benchmark" - from: "tests/validation/test_mcp_integration.py" to: "src/skill_retriever/mcp/server.py" via: "FastMCP client" pattern: "from fastmcp" --- <objective> Validate MCP server performance SLAs and end-to-end integration with all 5 tools. Purpose: Prove the system meets production-readiness criteria: fast startup, no degradation under load, and correct tool behavior in integrated context. Output: Performance benchmarks and integration tests proving system stability. </objective> <execution_context> @C:\Users\33641\.claude/get-shit-done/workflows/execute-plan.md @C:\Users\33641\.claude/get-shit-done/templates/summary.md </execution_context> <context> @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/07-integration-validation/07-RESEARCH.md @.planning/phases/07-integration-validation/07-01-PLAN.md @src/skill_retriever/mcp/server.py @tests/test_mcp_server.py </context> <tasks> <task type="auto"> <name>Task 1: Add pytest-benchmark and create performance tests</name> <files> pyproject.toml tests/validation/test_performance.py </files> <action> 1. Add pytest-benchmark to dev dependencies in pyproject.toml: ```toml "pytest-benchmark>=5.2", ``` Run `uv sync` to install. 2. Create test_performance.py with performance benchmarks: ```python """Performance benchmarks for MCP server and pipeline.""" from __future__ import annotations import asyncio from typing import TYPE_CHECKING import pytest if TYPE_CHECKING: from pytest_benchmark.fixture import BenchmarkFixture class TestStartupPerformance: """MCP server startup time tests.""" def test_pipeline_startup_under_3_seconds( self, benchmark: BenchmarkFixture ) -> None: """Pipeline initialization should complete in under 3 seconds.""" async def init_pipeline() -> None: # Reset global state to force fresh initialization from skill_retriever.mcp import server server._pipeline = None server._graph_store = None server._vector_store = None server._metadata_store = None await server.get_pipeline() def run_init() -> None: asyncio.run(init_pipeline()) # Run benchmark with warmup result = benchmark.pedantic( run_init, rounds=3, warmup_rounds=1, ) # Assert startup SLA max_time = benchmark.stats["max"] assert max_time < 3.0, f"Startup time {max_time:.2f}s exceeds 3s SLA" class TestQueryLatency: """Query latency and throughput tests.""" def test_simple_query_under_500ms( self, seeded_pipeline, benchmark: BenchmarkFixture ) -> None: """Simple query should complete in under 500ms.""" result = benchmark(seeded_pipeline.retrieve, "authentication agent", top_k=5) max_latency = benchmark.stats["max"] * 1000 # Convert to ms assert max_latency < 500, f"Query latency {max_latency:.0f}ms exceeds 500ms SLA" def test_complex_query_under_1000ms( self, seeded_pipeline, benchmark: BenchmarkFixture ) -> None: """Complex multi-hop query should complete in under 1000ms.""" complex_query = ( "I need JWT authentication with refresh tokens " "and OAuth integration for GitHub login" ) result = benchmark(seeded_pipeline.retrieve, complex_query, top_k=10) max_latency = benchmark.stats["max"] * 1000 assert max_latency < 1000, f"Complex query latency {max_latency:.0f}ms exceeds 1000ms" class TestLoadStability: """Load testing and degradation detection.""" def test_sequential_queries_no_degradation(self, seeded_pipeline) -> None: """10 sequential queries should not show performance degradation.""" queries = [ "JWT authentication", "GitHub repository analysis", "LinkedIn post writer", "OAuth login flow", "MCP server setup", "debugging agent", "code review tool", "email processing", "data analysis", "security audit", ] # Clear cache to ensure fresh queries seeded_pipeline.clear_cache() latencies: list[float] = [] for query in queries: result = seeded_pipeline.retrieve(query, top_k=5) latencies.append(result.latency_ms) # First 5 vs last 5 comparison first_half_avg = sum(latencies[:5]) / 5 second_half_avg = sum(latencies[5:]) / 5 print(f"\nLatencies: {[f'{l:.1f}ms' for l in latencies]}") print(f"First 5 avg: {first_half_avg:.1f}ms") print(f"Last 5 avg: {second_half_avg:.1f}ms") # Allow 50% degradation tolerance (generous for cold start effects) degradation_ratio = second_half_avg / first_half_avg if first_half_avg > 0 else 1 assert degradation_ratio < 1.5, ( f"Performance degraded by {(degradation_ratio - 1) * 100:.0f}%" ) def test_cached_queries_fast(self, seeded_pipeline) -> None: """Cached queries should be significantly faster than cold queries.""" query = "authentication agent" # Cold query (first run) seeded_pipeline.clear_cache() cold_result = seeded_pipeline.retrieve(query, top_k=5) cold_latency = cold_result.latency_ms # Warm query (cached) warm_result = seeded_pipeline.retrieve(query, top_k=5) warm_latency = warm_result.latency_ms print(f"\nCold latency: {cold_latency:.1f}ms") print(f"Warm latency: {warm_latency:.1f}ms") assert warm_result.cache_hit, "Second query should be cache hit" assert warm_latency < cold_latency * 0.5, ( f"Cached query not significantly faster: {warm_latency:.1f}ms vs {cold_latency:.1f}ms" ) ``` </action> <verify> Run: `uv sync && uv run pytest tests/validation/test_performance.py -v -s` All performance tests pass. Startup time reported < 3s. Sequential queries show no significant degradation. </verify> <done> - pytest-benchmark installed - Startup time validated < 3 seconds - Query latency validated < 500ms (simple) / 1000ms (complex) - 10 sequential queries show < 50% degradation - Cache hit performance validated </done> </task> <task type="auto"> <name>Task 2: Create end-to-end MCP integration tests</name> <files> tests/validation/test_mcp_integration.py </files> <action> Create test_mcp_integration.py with full end-to-end MCP tool tests. Use FastMCP's in-memory client pattern from research: ```python """End-to-end MCP integration tests.""" from __future__ import annotations import tempfile from pathlib import Path import pytest from fastmcp.client import Client from skill_retriever.mcp.server import mcp @pytest.fixture async def mcp_client(): """In-memory MCP client for testing.""" # Reset server state before each test from skill_retriever.mcp import server server._pipeline = None server._graph_store = None server._vector_store = None server._metadata_store = None async with Client(transport=mcp) as client: yield client class TestMCPToolDiscovery: """MCP tool registration and discovery tests.""" async def test_all_tools_registered(self, mcp_client: Client) -> None: """All 5 tools should be registered.""" tools = await mcp_client.list_tools() tool_names = {t.name for t in tools} expected = { "search_components", "get_component_detail", "install_components", "check_dependencies", "ingest_repo", } assert tool_names == expected, f"Missing tools: {expected - tool_names}" async def test_tool_schema_under_300_tokens(self, mcp_client: Client) -> None: """Total tool schema should stay under 300 tokens.""" tools = await mcp_client.list_tools() # Rough token estimation: 4 chars per token average total_schema_chars = sum( len(str(t.inputSchema)) + len(t.description or "") for t in tools ) estimated_tokens = total_schema_chars // 4 print(f"\nTool schema estimated tokens: {estimated_tokens}") assert estimated_tokens < 300, f"Schema {estimated_tokens} tokens exceeds 300" class TestSearchComponents: """search_components tool integration tests.""" async def test_search_returns_results(self, mcp_client: Client) -> None: """Search should return component recommendations.""" result = await mcp_client.call_tool( name="search_components", arguments={"query": "authentication", "top_k": 5}, ) # Result should have components (may be empty if no data indexed) assert result is not None # The actual structure depends on SearchResult schema async def test_search_with_type_filter(self, mcp_client: Client) -> None: """Search with component_type filter should work.""" result = await mcp_client.call_tool( name="search_components", arguments={ "query": "debugging tool", "component_type": "skill", "top_k": 3, }, ) assert result is not None class TestCheckDependencies: """check_dependencies tool integration tests.""" async def test_check_empty_list(self, mcp_client: Client) -> None: """Check with empty list should return empty results.""" result = await mcp_client.call_tool( name="check_dependencies", arguments={"component_ids": []}, ) assert result is not None class TestGetComponentDetail: """get_component_detail tool integration tests.""" async def test_get_nonexistent_component(self, mcp_client: Client) -> None: """Getting nonexistent component should return not found response.""" result = await mcp_client.call_tool( name="get_component_detail", arguments={"component_id": "nonexistent-id-12345"}, ) # Should return ComponentDetail with "not found" indication assert result is not None class TestInstallComponents: """install_components tool integration tests.""" async def test_install_to_temp_dir(self, mcp_client: Client) -> None: """Install should work with temp directory target.""" with tempfile.TemporaryDirectory() as tmpdir: result = await mcp_client.call_tool( name="install_components", arguments={ "component_ids": [], # Empty list - nothing to install "target_dir": tmpdir, }, ) assert result is not None class TestIngestRepo: """ingest_repo tool integration tests.""" async def test_ingest_invalid_url(self, mcp_client: Client) -> None: """Ingest with invalid URL should return error.""" result = await mcp_client.call_tool( name="ingest_repo", arguments={"repo_url": "not-a-valid-url"}, ) # Should return IngestResult with error message assert result is not None class TestEndToEndWorkflow: """Full workflow integration tests.""" async def test_search_then_check_deps(self, mcp_client: Client) -> None: """Search -> check dependencies workflow should work.""" # Search for components search_result = await mcp_client.call_tool( name="search_components", arguments={"query": "testing framework", "top_k": 3}, ) # Even with no results, check_dependencies should handle empty check_result = await mcp_client.call_tool( name="check_dependencies", arguments={"component_ids": []}, ) assert search_result is not None assert check_result is not None ``` </action> <verify> Run: `uv run pytest tests/validation/test_mcp_integration.py -v` All MCP integration tests pass. All 5 tools discoverable and callable. </verify> <done> - All 5 MCP tools registered and discoverable - Tool schemas under 300 tokens - search_components returns results - check_dependencies handles empty lists - get_component_detail handles not-found - install_components works with temp directory - ingest_repo handles invalid URLs gracefully - End-to-end workflow validated </done> </task> </tasks> <verification> All performance and integration tests pass: ```bash uv run pytest tests/validation/test_performance.py tests/validation/test_mcp_integration.py -v -s ``` Performance SLAs verified: ```bash uv run pytest tests/validation/test_performance.py -v --benchmark-only ``` MCP tools working: ```bash uv run pytest tests/validation/test_mcp_integration.py -v ``` </verification> <success_criteria> - [ ] pytest-benchmark installed and working - [ ] MCP startup time < 3 seconds - [ ] Simple query latency < 500ms - [ ] Complex query latency < 1000ms - [ ] 10 sequential queries < 50% degradation - [ ] All 5 MCP tools registered - [ ] Tool schemas < 300 tokens - [ ] End-to-end workflow tests pass </success_criteria> <output> After completion, create `.planning/phases/07-integration-validation/07-03-SUMMARY.md` </output>

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AnthonyAlcaraz/skill-retriever'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

07-03-PLAN.md•14 KiB