Skip to main content
Glama

DevFlow MCP

by Takin-Profit
E2E_TEST_PLAN.md18 kB
# DevFlow MCP E2E Test Plan **Last Updated:** 2025-10-16 **Status:** ✅ 65% Complete - Core functionality well tested, temporal and debug tools needed --- ## Overview Comprehensive end-to-end testing strategy for DevFlow MCP with **SQLite-only** architecture. This plan covers all **20 MCP tools** (17 standard + 3 debug tools when DEBUG=true) with happy path, edge case, validation, and error handling tests. **Current Coverage:** 194 tests implemented covering 13 of 20 tools (65%) --- ## Tools Coverage Status (20 total) ### Core CRUD Operations (5 tools) - ✅ COMPLETE 1. ✅ **create_entities** - Comprehensive (13 tests) 2. ✅ **read_graph** - Comprehensive (5 tests) 3. ✅ **delete_entities** - Comprehensive (7 tests) 4. ✅ **add_observations** - Comprehensive (7 tests) 5. ✅ **delete_observations** - Comprehensive (5 tests) ### Relation Management (5 tools) - ✅ COMPLETE 6. ✅ **create_relations** - Very comprehensive (30 tests) 7. ✅ **get_relation** - Comprehensive (8 tests) 8. ✅ **update_relation** - Comprehensive (5 tests) 9. ✅ **delete_relations** - Comprehensive (5 tests) 10. ⚠️ **get_relation_history** - NOT TESTED (temporal feature) ### Search & Discovery (4 tools) - ✅ COMPLETE 11. ✅ **search_nodes** - Comprehensive (9 tests) 12. ✅ **semantic_search** - Very comprehensive (14 tests) 13. ✅ **open_nodes** - Comprehensive (9 tests) 14. ✅ **get_entity_embedding** - Comprehensive (6 tests) ### Temporal Features (3 tools) - ⚠️ NEEDS IMPLEMENTATION 15. ⚠️ **get_entity_history** - NOT TESTED (implemented but untested) 16. ⚠️ **get_graph_at_time** - NOT TESTED (implemented but untested) 17. ⚠️ **get_decayed_graph** - NOT TESTED (implemented but untested) ### Debug Tools (3 tools) - ⚠️ NEEDS IMPLEMENTATION 18. ⚠️ **force_generate_embedding** - NOT TESTED (only available when DEBUG=true) 19. ⚠️ **debug_embedding_config** - NOT TESTED (only available when DEBUG=true) 20. ⚠️ **diagnose_vector_search** - NOT TESTED (only available when DEBUG=true) --- ## Test Categories - Current Status ### 1. Happy Path Tests - ✅ COMPREHENSIVE - ✅ Basic CRUD operations (all tools tested) - ✅ Relations creation and retrieval (all CRUD operations) - ✅ Search returns results (keyword and semantic) - ✅ All tools return properly formatted MCP responses - ✅ Vector embeddings work correctly ### 2. Validation Tests - ✅ VERY COMPREHENSIVE (67 tests in 04-validation.test.js) - ✅ Invalid entity types rejected (feature, task, decision, component validated) - ✅ Missing required fields rejected (comprehensive coverage) - ✅ Invalid relation types rejected (all 4 types validated) - ✅ Array fields validate correctly (empty, null, non-array) - ✅ Optional fields work (strength, confidence tested with boundary values) - ✅ Type coercion tested (string rejection for numeric fields) - ✅ Error messages are clear and specific ### 3. Edge Case Tests - ✅ COMPREHENSIVE - ✅ Empty arrays handled (multiple scenarios) - ✅ Special characters in entity names - ✅ Unicode characters (日本語 🚀) in entity names - ✅ Very long observation strings (tested up to 10,000 chars) - ✅ Very long entity names (tested up to 400+ chars) - ✅ Duplicate entity creation - ✅ Non-existent entity references - ✅ Circular relationships - ✅ Self-referencing relations - ✅ Null/undefined handling - ✅ Large metadata objects (5000 char descriptions, 100 tags, deeply nested) - ✅ Many observations (100+ per entity) ### 4. Error Handling Tests - ✅ COMPREHENSIVE - ✅ Tools return proper error messages - ✅ Validation errors have clear messages with field names - ✅ Database errors handled gracefully - ✅ Missing tool arguments caught - ✅ Invalid parameter types rejected - ✅ Boundary violations detected (strength/confidence >1.0 or <0.0) - ✅ Non-existent entities/relations handled gracefully ### 5. Real-World Scenario Tests - ⚠️ LIMITED - ⚠️ Software development workflow - Partial (basic entity/relation chains tested) - ⚠️ Knowledge graph evolution over time - NOT TESTED - ⚠️ Collaborative work - NOT TESTED - ⚠️ Complex dependency graphs - Limited (tested in relations) ### 6. Performance Tests - ⚠️ NOT IMPLEMENTED - ⚠️ Large batch operations (100+ entities) - NOT TESTED - ⚠️ Many concurrent relations (1000+) - NOT TESTED - ⚠️ Search with many results - NOT TESTED - ⚠️ Graph with 1000+ entities - Partial (tested up to 50) ### 7. SQLite Integration Tests - ⚠️ LIMITED - ✅ Vector search returns relevant results (sqlite-vec tested) - ⚠️ Temporal queries accurate - NOT TESTED - ⚠️ SQLite transactions work correctly - NOT EXPLICITLY TESTED - ⚠️ Internal optimizations active (WAL mode, cache) - NOT TESTED - ⚠️ Concurrent operations - NOT TESTED --- ## Test File Structure - CURRENT STATE ``` src/tests/integration/e2e/ ├── 00-mcp-client.test.js ✅ DONE (4 tests) │ └── Basic MCP protocol smoke tests ├── 01-crud.test.js ✅ DONE (37 tests) │ └── create_entities, read_graph, delete_entities, │ add_observations, delete_observations ├── 02-relations.test.js ✅ DONE (48 tests) │ └── create_relations, get_relation, update_relation, │ delete_relations (comprehensive edge cases) ├── 03-search.test.js ✅ DONE (38 tests) │ └── search_nodes, semantic_search, open_nodes, │ get_entity_embedding (with parameters) ├── 04-validation.test.js ✅ DONE (67 tests) │ └── Comprehensive validation across all tools ├── 05-temporal.test.js ⚠️ NEEDED (~25-30 tests) │ └── get_entity_history, get_relation_history, │ get_graph_at_time, get_decayed_graph ├── 06-debug-tools.test.js ⚠️ NEEDED (~15-20 tests) │ └── force_generate_embedding, debug_embedding_config, │ diagnose_vector_search ├── 07-performance.test.js ⚠️ NEEDED (~20 tests) │ └── Large batches, stress tests, concurrent operations ├── 08-scenarios.test.js ⚠️ NEEDED (~20 tests) │ └── Real-world workflows, multi-step operations └── fixtures/ ├── entities.js ✅ Comprehensive fixtures ├── relations.js ✅ Comprehensive fixtures ├── shared-client.js ✅ Shared MCP client setup └── helpers.js ✅ Test utility functions ``` --- ## Test Count Summary | Category | Tests | Status | Notes | |----------|-------|--------|-------| | MCP Client | 4 | ✅ Done | Basic smoke tests | | CRUD Operations | 37 | ✅ Done | All 5 tools comprehensive | | Relations | 48 | ✅ Done | Very comprehensive, edge cases | | Search & Discovery | 38 | ✅ Done | All 4 tools with parameters | | Validation & Errors | 67 | ✅ Done | Comprehensive validation | | **CURRENT TOTAL** | **194** | **65% Complete** | | | Temporal Features | 0 | ⚠️ Needed | 3 tools, ~25-30 tests | | Debug Tools | 0 | ⚠️ Needed | 3 tools, ~15-20 tests | | Performance | 0 | ⚠️ Needed | ~20 tests | | Real-World Scenarios | 0 | ⚠️ Needed | ~20 tests | | **REMAINING NEEDED** | **~80** | | | | **FINAL TARGET** | **~274** | | | --- ## Implementation Priority - UPDATED ### ✅ Phase 1: Core Functionality - COMPLETE (194 tests) - ✅ CRUD operations (37 tests) - ✅ Relation management (48 tests) - ✅ Search & discovery (38 tests) - ✅ Validation & errors (67 tests) - ✅ MCP client integration (4 tests) **Status:** COMPLETE --- ### ⚠️ Phase 2: Temporal Features - HIGH PRIORITY (~25-30 tests) **File:** `05-temporal.test.js` **Tools to Test:** 1. **get_entity_history** (8-10 tests) - Get history for entity with multiple versions - Verify chronological ordering - Handle entities with no history - Handle non-existent entities - Verify all temporal fields (version, createdAt, updatedAt, validFrom, validTo, changedBy) 2. **get_relation_history** (8-10 tests) - Get history for relation with multiple versions - Verify chronological ordering - Handle relations with no history - Handle non-existent relations - Verify temporal metadata preserved 3. **get_graph_at_time** (6-8 tests) - Get graph at specific timestamp - Verify only entities valid at that time - Verify only relations valid at that time - Handle timestamp before any data - Handle timestamp after all data - Handle current timestamp 4. **get_decayed_graph** (4-6 tests) - Get graph with confidence decay - Verify old relations have lower confidence - Verify recent relations have high confidence - Handle graph with no decay configuration - Handle empty graph **Estimated Effort:** 4-6 hours --- ### ⚠️ Phase 3: Debug Tools - MEDIUM PRIORITY (~15-20 tests) **File:** `06-debug-tools.test.js` **Prerequisites:** Set DEBUG=true in test environment **Tools to Test:** 1. **force_generate_embedding** (6-8 tests) - Force generate embedding for entity - Verify embedding stored in vector store - Verify embedding dimensions match model - Handle non-existent entity - Handle entity name vs ID lookup - Verify embedding service initialized 2. **debug_embedding_config** (4-6 tests) - Get embedding configuration - Verify OpenAI API key detected - Verify model configuration - Verify vector store status - Verify embedding job manager status - Check entities with embeddings count 3. **diagnose_vector_search** (4-6 tests) - Run vector search diagnostics - Verify SQLite vector index status - Check vector dimensions - Verify similarity function - Count indexed entities **Estimated Effort:** 3-4 hours --- ### ⚠️ Phase 4: Performance & Scalability - MEDIUM PRIORITY (~20 tests) **File:** `07-performance.test.js` **Test Scenarios:** 1. **Batch Operations** (6-8 tests) - Create 100+ entities in single batch - Create 1000+ relations in single batch - Delete 100+ entities in single batch - Measure operation time (<5s for 100 entities) 2. **Large Graph Operations** (6-8 tests) - Read graph with 1000+ entities - Search in graph with 1000+ entities - Semantic search with 1000+ entities - Verify performance <100ms for typical operations 3. **Concurrent Operations** (4-6 tests) - Multiple create_entities simultaneously - Multiple semantic_search simultaneously - Read and write operations simultaneously - Verify no race conditions or deadlocks **Estimated Effort:** 6-8 hours --- ### ⚠️ Phase 5: Real-World Scenarios - LOW PRIORITY (~20 tests) **File:** `08-scenarios.test.js` **Test Scenarios:** 1. **Software Development Workflow** (8-10 tests) - Feature → Tasks workflow - Tasks → Components dependencies - Decision → Implementation chain - Test → Component relationships 2. **Knowledge Graph Evolution** (6-8 tests) - Build graph over time - Update entities and relations - Track changes through history - Verify temporal consistency 3. **Complex Dependency Management** (4-6 tests) - Multi-level dependency trees - Circular dependency detection - Dependency impact analysis - Cascade delete scenarios **Estimated Effort:** 4-6 hours --- ## Success Criteria - UPDATED ### ✅ Already Achieved - ✅ 13 of 20 tools tested with valid inputs - ✅ 13 of 20 tools tested with invalid inputs - ✅ Error cases return proper error messages - ✅ All validation rules enforced - ✅ Zero flaky tests (current tests stable) - ✅ SQLite-only architecture validated - ✅ Comprehensive edge case coverage ### ⚠️ Remaining Goals - ⚠️ All 20 tools tested with valid inputs (7 tools remaining) - ⚠️ All 20 tools tested with invalid inputs (7 tools remaining) - ⚠️ Temporal features validated - ⚠️ Debug tools validated - ⚠️ Real-world scenarios pass - ⚠️ Performance acceptable (<100ms for typical queries) - ⚠️ Concurrent operations verified - ⚠️ Internal optimizations verified (WAL mode, cache) --- ## Running Tests ```bash # All E2E tests npm run test:e2e # or pnpm run test:e2e # Specific test file node --test src/tests/integration/e2e/01-crud.test.js # With DEBUG tools enabled DEBUG=true node --test src/tests/integration/e2e/06-debug-tools.test.js # All tests (unit + integration + e2e) pnpm test # Run specific test suite node --test src/tests/integration/e2e/02-relations.test.js # Watch mode (if configured) pnpm run test:watch ``` --- ## Test Data & Validation Rules ### Entity Types (Required) ```typescript "feature" | "task" | "decision" | "component" ``` **Note:** "test" entity type removed in latest schema ### Relation Types (Required) ```typescript "depends_on" | "implements" | "part_of" | "relates_to" ``` ### Optional Fields with Defaults ```typescript strength: 0.0 - 1.0 (relation strength) confidence: 0.0 - 1.0 (relation confidence) limit: number (default: 10, for search results) min_similarity: 0.0 - 1.0 (default: 0.6, for semantic search) hybrid_search: boolean (default: true, for semantic search) semantic_weight: 0.0 - 1.0 (default: 0.6, for hybrid search) ``` ### Temporal Fields (Optional, auto-generated) ```typescript id: string (UUID, auto-generated) version: number (auto-incremented) createdAt: number (timestamp in ms) updatedAt: number (timestamp in ms) validFrom: number (timestamp in ms) validTo: number (timestamp in ms, optional) changedBy: string (user/system identifier, optional) ``` ### MCP Response Format ```json { "content": [ { "type": "text", "text": "JSON.stringify(data)" } ] } ``` ### MCP Error Format ```json { "message": "Error message with context", "code": -32603 } ``` --- ## Architecture Notes ### SQLite-Only Implementation - **Database:** SQLite with sqlite-vec extension for vector operations - **Vector Store:** Integrated into SQLite (no separate vector database) - **Schema Manager:** SqliteSchemaManager handles table creation and indices - **No External Dependencies:** No Docker, no Neo4j, no separate services - **File Location:** Configurable via `DFM_SQLITE_LOCATION` env var - **Test Database:** Uses `:memory:` for E2E tests (fast, isolated) ### Test Infrastructure - **Client:** Shared MCP client instance across tests (performance optimization) - **Fixtures:** Reusable entity and relation fixtures - **Helpers:** MCPTestHelper class for common operations - **Cleanup:** Automatic cleanup between test suites - **Isolation:** Each test suite creates unique entity names with timestamps --- ## Next Steps for Contributors ### Immediate Priorities (Ordered by Value) 1. **Implement Temporal Tests** (HIGH VALUE - 25-30 tests) - File: `05-temporal.test.js` - Test 4 temporal tools - Estimated: 4-6 hours - See Phase 2 above for details 2. **Implement Debug Tool Tests** (MEDIUM VALUE - 15-20 tests) - File: `06-debug-tools.test.js` - Test 3 debug tools - Requires DEBUG=true - Estimated: 3-4 hours - See Phase 3 above for details 3. **Implement Performance Tests** (MEDIUM VALUE - 20 tests) - File: `07-performance.test.js` - Large batches, stress tests, concurrent operations - Estimated: 6-8 hours - See Phase 4 above for details 4. **Implement Scenario Tests** (LOW VALUE - 20 tests) - File: `08-scenarios.test.js` - Real-world workflows - Estimated: 4-6 hours - See Phase 5 above for details --- ## Test Implementation Guide ### Setting Up a New Test File ```javascript /** * E2E Tests: [Category Name] * [Description of what this file tests] */ import { strictEqual, ok } from 'node:assert/strict' import { after, before, describe, it } from 'node:test' import { getSharedClient, cleanupAllTestData } from './fixtures/shared-client.js' describe('E2E: [Category Name]', () => { let client let helper before(async () => { const shared = await getSharedClient() client = shared.client helper = shared.helper // Optional: Clean up before tests await cleanupAllTestData() }) after(async () => { // Clean up after tests await cleanupAllTestData() }) describe('[tool_name]', () => { it('should [test description]', async () => { // Arrange const testData = { /* ... */ } // Act const result = await helper.callToolJSON('[tool_name]', testData) // Assert strictEqual(result.property, expectedValue) ok(result.success) }) it('should reject [invalid input]', async () => { await helper.expectToolError( '[tool_name]', { invalidData: true }, 'expected error keyword' ) }) }) }) ``` ### Using Test Helpers ```javascript // Create entities const entities = await helper.createEntities([ { name: 'test1', entityType: 'feature', observations: ['obs1'] } ]) // Call any tool and parse JSON const result = await helper.callToolJSON('tool_name', { args }) // Expect an error await helper.expectToolError('tool_name', { bad_args }, 'error keyword') // Read entire graph const graph = await helper.readGraph() // Search const results = await helper.searchNodes('query') const semantic = await helper.semanticSearch('query', { limit: 5 }) // Delete entities await helper.deleteEntities(['entity1', 'entity2']) ``` --- ## Maintenance ### Updating This Document **When to Update:** - After adding new test files - After completing a test phase - When test counts change significantly - When tools are added/removed/modified - After architecture changes **Update Checklist:** - [ ] Update test counts in tables - [ ] Update completion percentages - [ ] Update status indicators (✅ ⚠️) - [ ] Update "Last Updated" date - [ ] Update test file structure - [ ] Update implementation priority if needed --- **Maintained By:** DevFlow Team **Last Updated:** 2025-10-16 **Next Review:** After temporal tests implementation

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Takin-Profit/devflow-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server