DevFlow MCP

E2E_TEST_PLAN.md•17.6 KiB

# DevFlow MCP E2E Test Plan **Last Updated:** 2025-10-16 **Status:** ✅ 65% Complete - Core functionality well tested, temporal and debug tools needed --- ## Overview Comprehensive end-to-end testing strategy for DevFlow MCP with **SQLite-only** architecture. This plan covers all **20 MCP tools** (17 standard + 3 debug tools when DEBUG=true) with happy path, edge case, validation, and error handling tests. **Current Coverage:** 194 tests implemented covering 13 of 20 tools (65%) --- ## Tools Coverage Status (20 total) ### Core CRUD Operations (5 tools) - ✅ COMPLETE 1. ✅ **create_entities** - Comprehensive (13 tests) 2. ✅ **read_graph** - Comprehensive (5 tests) 3. ✅ **delete_entities** - Comprehensive (7 tests) 4. ✅ **add_observations** - Comprehensive (7 tests) 5. ✅ **delete_observations** - Comprehensive (5 tests) ### Relation Management (5 tools) - ✅ COMPLETE 6. ✅ **create_relations** - Very comprehensive (30 tests) 7. ✅ **get_relation** - Comprehensive (8 tests) 8. ✅ **update_relation** - Comprehensive (5 tests) 9. ✅ **delete_relations** - Comprehensive (5 tests) 10. ⚠️ **get_relation_history** - NOT TESTED (temporal feature) ### Search & Discovery (4 tools) - ✅ COMPLETE 11. ✅ **search_nodes** - Comprehensive (9 tests) 12. ✅ **semantic_search** - Very comprehensive (14 tests) 13. ✅ **open_nodes** - Comprehensive (9 tests) 14. ✅ **get_entity_embedding** - Comprehensive (6 tests) ### Temporal Features (3 tools) - ⚠️ NEEDS IMPLEMENTATION 15. ⚠️ **get_entity_history** - NOT TESTED (implemented but untested) 16. ⚠️ **get_graph_at_time** - NOT TESTED (implemented but untested) 17. ⚠️ **get_decayed_graph** - NOT TESTED (implemented but untested) ### Debug Tools (3 tools) - ⚠️ NEEDS IMPLEMENTATION 18. ⚠️ **force_generate_embedding** - NOT TESTED (only available when DEBUG=true) 19. ⚠️ **debug_embedding_config** - NOT TESTED (only available when DEBUG=true) 20. ⚠️ **diagnose_vector_search** - NOT TESTED (only available when DEBUG=true) --- ## Test Categories - Current Status ### 1. Happy Path Tests - ✅ COMPREHENSIVE - ✅ Basic CRUD operations (all tools tested) - ✅ Relations creation and retrieval (all CRUD operations) - ✅ Search returns results (keyword and semantic) - ✅ All tools return properly formatted MCP responses - ✅ Vector embeddings work correctly ### 2. Validation Tests - ✅ VERY COMPREHENSIVE (67 tests in 04-validation.test.js) - ✅ Invalid entity types rejected (feature, task, decision, component validated) - ✅ Missing required fields rejected (comprehensive coverage) - ✅ Invalid relation types rejected (all 4 types validated) - ✅ Array fields validate correctly (empty, null, non-array) - ✅ Optional fields work (strength, confidence tested with boundary values) - ✅ Type coercion tested (string rejection for numeric fields) - ✅ Error messages are clear and specific ### 3. Edge Case Tests - ✅ COMPREHENSIVE - ✅ Empty arrays handled (multiple scenarios) - ✅ Special characters in entity names - ✅ Unicode characters (日本語 🚀) in entity names - ✅ Very long observation strings (tested up to 10,000 chars) - ✅ Very long entity names (tested up to 400+ chars) - ✅ Duplicate entity creation - ✅ Non-existent entity references - ✅ Circular relationships - ✅ Self-referencing relations - ✅ Null/undefined handling - ✅ Large metadata objects (5000 char descriptions, 100 tags, deeply nested) - ✅ Many observations (100+ per entity) ### 4. Error Handling Tests - ✅ COMPREHENSIVE - ✅ Tools return proper error messages - ✅ Validation errors have clear messages with field names - ✅ Database errors handled gracefully - ✅ Missing tool arguments caught - ✅ Invalid parameter types rejected - ✅ Boundary violations detected (strength/confidence >1.0 or <0.0) - ✅ Non-existent entities/relations handled gracefully ### 5. Real-World Scenario Tests - ⚠️ LIMITED - ⚠️ Software development workflow - Partial (basic entity/relation chains tested) - ⚠️ Knowledge graph evolution over time - NOT TESTED - ⚠️ Collaborative work - NOT TESTED - ⚠️ Complex dependency graphs - Limited (tested in relations) ### 6. Performance Tests - ⚠️ NOT IMPLEMENTED - ⚠️ Large batch operations (100+ entities) - NOT TESTED - ⚠️ Many concurrent relations (1000+) - NOT TESTED - ⚠️ Search with many results - NOT TESTED - ⚠️ Graph with 1000+ entities - Partial (tested up to 50) ### 7. SQLite Integration Tests - ⚠️ LIMITED - ✅ Vector search returns relevant results (sqlite-vec tested) - ⚠️ Temporal queries accurate - NOT TESTED - ⚠️ SQLite transactions work correctly - NOT EXPLICITLY TESTED - ⚠️ Internal optimizations active (WAL mode, cache) - NOT TESTED - ⚠️ Concurrent operations - NOT TESTED --- ## Test File Structure - CURRENT STATE ``` src/tests/integration/e2e/ ├── 00-mcp-client.test.js ✅ DONE (4 tests) │ └── Basic MCP protocol smoke tests ├── 01-crud.test.js ✅ DONE (37 tests) │ └── create_entities, read_graph, delete_entities, │ add_observations, delete_observations ├── 02-relations.test.js ✅ DONE (48 tests) │ └── create_relations, get_relation, update_relation, │ delete_relations (comprehensive edge cases) ├── 03-search.test.js ✅ DONE (38 tests) │ └── search_nodes, semantic_search, open_nodes, │ get_entity_embedding (with parameters) ├── 04-validation.test.js ✅ DONE (67 tests) │ └── Comprehensive validation across all tools ├── 05-temporal.test.js ⚠️ NEEDED (~25-30 tests) │ └── get_entity_history, get_relation_history, │ get_graph_at_time, get_decayed_graph ├── 06-debug-tools.test.js ⚠️ NEEDED (~15-20 tests) │ └── force_generate_embedding, debug_embedding_config, │ diagnose_vector_search ├── 07-performance.test.js ⚠️ NEEDED (~20 tests) │ └── Large batches, stress tests, concurrent operations ├── 08-scenarios.test.js ⚠️ NEEDED (~20 tests) │ └── Real-world workflows, multi-step operations └── fixtures/ ├── entities.js ✅ Comprehensive fixtures ├── relations.js ✅ Comprehensive fixtures ├── shared-client.js ✅ Shared MCP client setup └── helpers.js ✅ Test utility functions ``` --- ## Test Count Summary | Category | Tests | Status | Notes | |----------|-------|--------|-------| | MCP Client | 4 | ✅ Done | Basic smoke tests | | CRUD Operations | 37 | ✅ Done | All 5 tools comprehensive | | Relations | 48 | ✅ Done | Very comprehensive, edge cases | | Search & Discovery | 38 | ✅ Done | All 4 tools with parameters | | Validation & Errors | 67 | ✅ Done | Comprehensive validation | | **CURRENT TOTAL** | **194** | **65% Complete** | | | Temporal Features | 0 | ⚠️ Needed | 3 tools, ~25-30 tests | | Debug Tools | 0 | ⚠️ Needed | 3 tools, ~15-20 tests | | Performance | 0 | ⚠️ Needed | ~20 tests | | Real-World Scenarios | 0 | ⚠️ Needed | ~20 tests | | **REMAINING NEEDED** | **~80** | | | | **FINAL TARGET** | **~274** | | | --- ## Implementation Priority - UPDATED ### ✅ Phase 1: Core Functionality - COMPLETE (194 tests) - ✅ CRUD operations (37 tests) - ✅ Relation management (48 tests) - ✅ Search & discovery (38 tests) - ✅ Validation & errors (67 tests) - ✅ MCP client integration (4 tests) **Status:** COMPLETE --- ### ⚠️ Phase 2: Temporal Features - HIGH PRIORITY (~25-30 tests) **File:** `05-temporal.test.js` **Tools to Test:** 1. **get_entity_history** (8-10 tests) - Get history for entity with multiple versions - Verify chronological ordering - Handle entities with no history - Handle non-existent entities - Verify all temporal fields (version, createdAt, updatedAt, validFrom, validTo, changedBy) 2. **get_relation_history** (8-10 tests) - Get history for relation with multiple versions - Verify chronological ordering - Handle relations with no history - Handle non-existent relations - Verify temporal metadata preserved 3. **get_graph_at_time** (6-8 tests) - Get graph at specific timestamp - Verify only entities valid at that time - Verify only relations valid at that time - Handle timestamp before any data - Handle timestamp after all data - Handle current timestamp 4. **get_decayed_graph** (4-6 tests) - Get graph with confidence decay - Verify old relations have lower confidence - Verify recent relations have high confidence - Handle graph with no decay configuration - Handle empty graph **Estimated Effort:** 4-6 hours --- ### ⚠️ Phase 3: Debug Tools - MEDIUM PRIORITY (~15-20 tests) **File:** `06-debug-tools.test.js` **Prerequisites:** Set DEBUG=true in test environment **Tools to Test:** 1. **force_generate_embedding** (6-8 tests) - Force generate embedding for entity - Verify embedding stored in vector store - Verify embedding dimensions match model - Handle non-existent entity - Handle entity name vs ID lookup - Verify embedding service initialized 2. **debug_embedding_config** (4-6 tests) - Get embedding configuration - Verify OpenAI API key detected - Verify model configuration - Verify vector store status - Verify embedding job manager status - Check entities with embeddings count 3. **diagnose_vector_search** (4-6 tests) - Run vector search diagnostics - Verify SQLite vector index status - Check vector dimensions - Verify similarity function - Count indexed entities **Estimated Effort:** 3-4 hours --- ### ⚠️ Phase 4: Performance & Scalability - MEDIUM PRIORITY (~20 tests) **File:** `07-performance.test.js` **Test Scenarios:** 1. **Batch Operations** (6-8 tests) - Create 100+ entities in single batch - Create 1000+ relations in single batch - Delete 100+ entities in single batch - Measure operation time (<5s for 100 entities) 2. **Large Graph Operations** (6-8 tests) - Read graph with 1000+ entities - Search in graph with 1000+ entities - Semantic search with 1000+ entities - Verify performance <100ms for typical operations 3. **Concurrent Operations** (4-6 tests) - Multiple create_entities simultaneously - Multiple semantic_search simultaneously - Read and write operations simultaneously - Verify no race conditions or deadlocks **Estimated Effort:** 6-8 hours --- ### ⚠️ Phase 5: Real-World Scenarios - LOW PRIORITY (~20 tests) **File:** `08-scenarios.test.js` **Test Scenarios:** 1. **Software Development Workflow** (8-10 tests) - Feature → Tasks workflow - Tasks → Components dependencies - Decision → Implementation chain - Test → Component relationships 2. **Knowledge Graph Evolution** (6-8 tests) - Build graph over time - Update entities and relations - Track changes through history - Verify temporal consistency 3. **Complex Dependency Management** (4-6 tests) - Multi-level dependency trees - Circular dependency detection - Dependency impact analysis - Cascade delete scenarios **Estimated Effort:** 4-6 hours --- ## Success Criteria - UPDATED ### ✅ Already Achieved - ✅ 13 of 20 tools tested with valid inputs - ✅ 13 of 20 tools tested with invalid inputs - ✅ Error cases return proper error messages - ✅ All validation rules enforced - ✅ Zero flaky tests (current tests stable) - ✅ SQLite-only architecture validated - ✅ Comprehensive edge case coverage ### ⚠️ Remaining Goals - ⚠️ All 20 tools tested with valid inputs (7 tools remaining) - ⚠️ All 20 tools tested with invalid inputs (7 tools remaining) - ⚠️ Temporal features validated - ⚠️ Debug tools validated - ⚠️ Real-world scenarios pass - ⚠️ Performance acceptable (<100ms for typical queries) - ⚠️ Concurrent operations verified - ⚠️ Internal optimizations verified (WAL mode, cache) --- ## Running Tests ```bash # All E2E tests npm run test:e2e # or pnpm run test:e2e # Specific test file node --test src/tests/integration/e2e/01-crud.test.js # With DEBUG tools enabled DEBUG=true node --test src/tests/integration/e2e/06-debug-tools.test.js # All tests (unit + integration + e2e) pnpm test # Run specific test suite node --test src/tests/integration/e2e/02-relations.test.js # Watch mode (if configured) pnpm run test:watch ``` --- ## Test Data & Validation Rules ### Entity Types (Required) ```typescript "feature" | "task" | "decision" | "component" ``` **Note:** "test" entity type removed in latest schema ### Relation Types (Required) ```typescript "depends_on" | "implements" | "part_of" | "relates_to" ``` ### Optional Fields with Defaults ```typescript strength: 0.0 - 1.0 (relation strength) confidence: 0.0 - 1.0 (relation confidence) limit: number (default: 10, for search results) min_similarity: 0.0 - 1.0 (default: 0.6, for semantic search) hybrid_search: boolean (default: true, for semantic search) semantic_weight: 0.0 - 1.0 (default: 0.6, for hybrid search) ``` ### Temporal Fields (Optional, auto-generated) ```typescript id: string (UUID, auto-generated) version: number (auto-incremented) createdAt: number (timestamp in ms) updatedAt: number (timestamp in ms) validFrom: number (timestamp in ms) validTo: number (timestamp in ms, optional) changedBy: string (user/system identifier, optional) ``` ### MCP Response Format ```json { "content": [ { "type": "text", "text": "JSON.stringify(data)" } ] } ``` ### MCP Error Format ```json { "message": "Error message with context", "code": -32603 } ``` --- ## Architecture Notes ### SQLite-Only Implementation - **Database:** SQLite with sqlite-vec extension for vector operations - **Vector Store:** Integrated into SQLite (no separate vector database) - **Schema Manager:** SqliteSchemaManager handles table creation and indices - **No External Dependencies:** No Docker, no Neo4j, no separate services - **File Location:** Configurable via `DFM_SQLITE_LOCATION` env var - **Test Database:** Uses `:memory:` for E2E tests (fast, isolated) ### Test Infrastructure - **Client:** Shared MCP client instance across tests (performance optimization) - **Fixtures:** Reusable entity and relation fixtures - **Helpers:** MCPTestHelper class for common operations - **Cleanup:** Automatic cleanup between test suites - **Isolation:** Each test suite creates unique entity names with timestamps --- ## Next Steps for Contributors ### Immediate Priorities (Ordered by Value) 1. **Implement Temporal Tests** (HIGH VALUE - 25-30 tests) - File: `05-temporal.test.js` - Test 4 temporal tools - Estimated: 4-6 hours - See Phase 2 above for details 2. **Implement Debug Tool Tests** (MEDIUM VALUE - 15-20 tests) - File: `06-debug-tools.test.js` - Test 3 debug tools - Requires DEBUG=true - Estimated: 3-4 hours - See Phase 3 above for details 3. **Implement Performance Tests** (MEDIUM VALUE - 20 tests) - File: `07-performance.test.js` - Large batches, stress tests, concurrent operations - Estimated: 6-8 hours - See Phase 4 above for details 4. **Implement Scenario Tests** (LOW VALUE - 20 tests) - File: `08-scenarios.test.js` - Real-world workflows - Estimated: 4-6 hours - See Phase 5 above for details --- ## Test Implementation Guide ### Setting Up a New Test File ```javascript /** * E2E Tests: [Category Name] * [Description of what this file tests] */ import { strictEqual, ok } from 'node:assert/strict' import { after, before, describe, it } from 'node:test' import { getSharedClient, cleanupAllTestData } from './fixtures/shared-client.js' describe('E2E: [Category Name]', () => { let client let helper before(async () => { const shared = await getSharedClient() client = shared.client helper = shared.helper // Optional: Clean up before tests await cleanupAllTestData() }) after(async () => { // Clean up after tests await cleanupAllTestData() }) describe('[tool_name]', () => { it('should [test description]', async () => { // Arrange const testData = { /* ... */ } // Act const result = await helper.callToolJSON('[tool_name]', testData) // Assert strictEqual(result.property, expectedValue) ok(result.success) }) it('should reject [invalid input]', async () => { await helper.expectToolError( '[tool_name]', { invalidData: true }, 'expected error keyword' ) }) }) }) ``` ### Using Test Helpers ```javascript // Create entities const entities = await helper.createEntities([ { name: 'test1', entityType: 'feature', observations: ['obs1'] } ]) // Call any tool and parse JSON const result = await helper.callToolJSON('tool_name', { args }) // Expect an error await helper.expectToolError('tool_name', { bad_args }, 'error keyword') // Read entire graph const graph = await helper.readGraph() // Search const results = await helper.searchNodes('query') const semantic = await helper.semanticSearch('query', { limit: 5 }) // Delete entities await helper.deleteEntities(['entity1', 'entity2']) ``` --- ## Maintenance ### Updating This Document **When to Update:** - After adding new test files - After completing a test phase - When test counts change significantly - When tools are added/removed/modified - After architecture changes **Update Checklist:** - [ ] Update test counts in tables - [ ] Update completion percentages - [ ] Update status indicators (✅ ⚠️) - [ ] Update "Last Updated" date - [ ] Update test file structure - [ ] Update implementation priority if needed --- **Maintained By:** DevFlow Team **Last Updated:** 2025-10-16 **Next Review:** After temporal tests implementation

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Takin-Profit/devflow-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

E2E_TEST_PLAN.md•17.6 KiB