folder-mcp

Overview Schema Related Servers Score Discussions

Phase-9-PRD-MCP-Endpoints-Multi-Folder-Support.md•23.7 KiB

# Phase 9: MCP Endpoints Multi-Folder Support **Status**: 📋 PLANNED **Priority**: HIGH **Start Date**: TBD **Approach**: Endpoint-by-Endpoint Implementation **🚀 Implementation Plan**: [Phase 9 Simple SCRUM Plan](./Phase-9-Implementation-epic.md) ## 🎯 **Overview** Update our existing MCP endpoints to work with the multi-folder indexing system. The core change: **transition from single-folder to multi-folder operations** while maintaining the same excellent endpoint interface. ## 🌟 **The Problem** Our MCP endpoints currently expect a single folder and model. With Task 11.5 complete, we now have: - Multiple folders with different models - Folder lifecycle states (pending → active → error) - Path-aware topic detection (prerequisite) **The endpoints need to be updated to work with this new reality.** ## 🎯 **Vision** Keep our excellent 10-endpoint interface, but make each endpoint **folder-aware** and **state-aware**: - LLMs specify which folder to operate on - Endpoints validate folder states before operations - Responses include folder context and attribution - Model loading happens automatically per folder --- ## 📡 **Endpoint Changes Required** ### 1. **`search` Endpoint** **Current**: Searches the single configured folder **Required Changes**: - **Add `folder` parameter (required)** - folder name or path - **Validate folder is in 'active' state** before searching - **Load correct model for the target folder** - **Include folder attribution in results** - **Return folder-specific error messages** when folder unavailable **New Request Schema**: ```json { "query": "quadratic equations", "folder": "MathCourse", // NEW: Required parameter "mode": "semantic", "limit": 10 } ``` **New Response Addition**: ```json { "folderContext": { "name": "MathCourse", "path": "/Users/bob/MathCourse", "model": "all-MiniLM-L6-v2", "status": "active" } // ... existing results structure } ``` --- ### 2. **`get_folder_info` Endpoint** **Current**: Returns info about the single folder **Required Changes**: - **Return ALL configured folders** with their states and topics - **Include topic clusters** generated by path-aware detection - **Show folder lifecycle status** (pending, indexing, active, error) - **Provide model information** and indexing progress **Enhanced Response**: ```json { "folders": [ { "name": "MathCourse", "path": "/Users/bob/MathCourse", "status": "active", "model": "all-MiniLM-L6-v2", "documentCount": 150, "topics": [ {"weight": 45, "terms": ["algebra", "quadratic", "equations"]}, {"weight": 30, "terms": ["statistics", "probability"]} ], "lastIndexed": "2024-01-15T10:30:00Z" }, { "name": "ChemistryCourse", "status": "indexing", "progress": 65, "model": "sentence-transformers/all-mpnet-base-v2" } ] } ``` --- ### 3. **`get_document_outline` Endpoint** **Current**: Takes document ID from single folder **Required Changes**: - **Accept full document path** OR **folder + relative path** - **Resolve which folder contains the document** - **Validate folder is active** before processing - **Include folder context** in response --- ### 4. **`get_document_data` Endpoint** **Current**: Returns document from single folder **Required Changes**: - **Accept folder-aware document identification** - **Validate folder accessibility** - **Include folder attribution** in metadata --- ### 5. **`list_documents` Endpoint** **Current**: Lists documents in the configured folder **Required Changes**: - **Add optional `folder` parameter** to list specific folder - **Default to listing ALL folders** if no folder specified - **Show folder context** for each document - **Filter by folder status** (only show active folders by default) --- ### 6. **Document Format Endpoints** (`get_sheet_data`, `get_slides`, `get_pages`) **Current**: Process documents from single folder **Required Changes**: - **Accept folder-aware document paths** - **Validate folder states** before processing - **Include folder context** in responses - **Handle folder-specific processing errors** --- ### 7. **`get_status` Endpoint** **Current**: Shows single folder status **Required Changes**: - **Show system-wide status** across all folders - **Include model memory usage** and cache status - **Report folder lifecycle states** and counts - **Show topic detection health** and performance metrics **Enhanced Response**: ```json { "systemStatus": "healthy", "totalFolders": 5, "activeFolders": 3, "indexingFolders": 1, "errorFolders": 1, "modelCache": { "loadedModels": ["all-MiniLM-L6-v2", "sentence-transformers/all-mpnet-base-v2"], "memoryUsage": "2.1GB", "cacheHits": 145, "cacheMisses": 12 }, "topicDetection": { "clustersGenerated": 47, "averageCoherence": 0.82, "recomputesPending": 2 } } ``` --- ## 🔧 **Core Infrastructure Changes** ### 1. **Folder Resolution Service** - **Map folder names to paths** and validate existence - **Check folder lifecycle states** before operations - **Provide clear error messages** when folders unavailable - **Handle folder aliases** and path normalization ### 2. **Model Loading Orchestration** - **Load correct model per folder** for search operations - **Cache 2-3 models** with LRU eviction - **Handle model loading failures** gracefully - **Report model switching performance** ### 3. **State Validation Middleware** - **Block operations on non-active folders** with helpful errors - **Allow read operations** on indexing folders (partial data) - **Prevent operations** on error/pending folders - **Provide status-specific guidance** to LLMs ### 4. **Response Enhancement** - **Add folder context** to all responses - **Include operation metadata** (model used, processing time) - **Provide folder-specific errors** and suggestions - **Enable result attribution** for LLM understanding --- ## 🎭 **Updated User Experience** ### For LLMs (Primary Users) **Step 1: Discover Available Folders** ``` LLM calls: getFolderInfo() LLM sees: 3 active folders with topic hints ``` **Step 2: Select Appropriate Folder** ``` User: "Find quadratic equations" LLM: Sees MathCourse has topic "algebra, equations" LLM calls: search({ folder: "MathCourse", query: "quadratic equations" }) ``` **Step 3: Handle Folder Issues** ``` LLM calls: search({ folder: "ChemistryCourse", query: "..." }) Response: "Folder ChemistryCourse is currently indexing (65% complete)" LLM: "The Chemistry folder is still being indexed. Try again in a few minutes?" ``` ### For End Users - **Transparent folder management** - LLM handles the complexity - **Clear error messages** when folders unavailable - **Automatic folder selection** based on topic matching - **Progress visibility** for folders being indexed --- ## 🎯 **Success Criteria** ### Functional Requirements - [ ] All 10 endpoints work with multi-folder setup - [ ] Folder parameter required where appropriate - [ ] State validation prevents invalid operations - [ ] Model loading works per folder automatically - [ ] Error messages are folder-aware and actionable ### Performance Requirements - [ ] Search latency < 500ms including model loading - [ ] Folder resolution < 10ms per request - [ ] Model switching < 2s between folders - [ ] State validation < 5ms overhead per request ### UX Requirements - [ ] LLMs can successfully navigate multiple folders - [ ] Error messages guide users to resolution - [ ] Progress feedback for long-running operations - [ ] Folder context clear in all responses --- ## 📋 **Implementation Phases** (Easiest to Hardest) ### Phase 1: MCP Server → Daemon Architecture Fix 🚨 *CRITICAL FOUNDATION* **Goal**: Connect MCP server to daemon instead of direct file access **The Problem**: Current MCP server takes folder path as argument and operates independently: ``` BROKEN: Claude → MCP Server → Direct File Access (single folder only) ``` **The Solution**: MCP server becomes a thin client connecting to daemon: ``` FIXED: Claude → MCP Server → WebSocket → Daemon → Multi-Folder System ``` **Specific Tasks**: 1. **Remove folder arguments** from MCP server entry point (`mcp-server.ts`) 2. **Add WebSocket client** to MCP server for daemon communication 3. **Create daemon API endpoints** for MCP operations 4. **Implement protocol translation** - MCP calls ↔ Daemon API calls 5. **Test architecture** - Verify MCP server connects to daemon successfully **Architecture Verification Test**: ```bash # 1. Start daemon (manages all folders) npm run daemon # 2. Start MCP server (no folder arguments!) node dist/mcp-server.js # Connects to daemon via WebSocket # 3. Test connection Agent calls: get_status → MCP Server → Daemon → Returns system info ``` **Success Criteria**: - MCP server starts WITHOUT folder path arguments - WebSocket connection established to daemon - MCP calls successfully forwarded to daemon - Daemon responses properly returned to MCP clients - UI folder changes immediately visible to MCP clients ### Phase 2: "Hello World" - Status Endpoint ⭐ *FIRST REAL TEST* **Goal**: Get a single endpoint working with multi-folder data via daemon **Why `get_status` first**: - No folder parameters required - simplest interface change - No model loading - no complex dependencies - No state validation - just read current system state from daemon - Perfect for testing MCP→Daemon→Multi-Folder flow **Specific Tasks**: 1. **Implement daemon status API** - Return multi-folder system info 2. **Connect MCP endpoint to daemon** - Forward get_status calls 3. **Test with agent** - Verify endpoint responds with multi-folder data via daemon 4. **Validate JSON structure** - Ensure response format is correct **"Hello World" Test Case**: ```bash # Agent calls: get_status # MCP Server → Daemon API → Multi-folder status # Expected response includes: { "systemStatus": "healthy", "totalFolders": 3, "activeFolders": 2, "indexingFolders": 1, "errorFolders": 0 // ... basic multi-folder awareness from daemon } ``` **Success Criteria**: - Agent can successfully call `get_status` endpoint - MCP server forwards call to daemon successfully - Response includes multi-folder system information from daemon - No crashes or MCP protocol errors - Foundation established for other endpoints ### Phase 3: Read-Only Multi-Folder Endpoints - **Enhance `get_folder_info`** - Forward to daemon for all folders with topics - **Update `list_documents`** - Forward to daemon for folder-specific document lists - **Test agent navigation** - Agent can discover folders and their contents via daemon ### Phase 4: Folder-Aware Document Retrieval - **Update `get_document_outline`** - Forward to daemon with folder-aware document paths - **Update `get_document_data`** - Forward to daemon with folder resolution - **Update format endpoints** (`get_sheet_data`, `get_slides`, `get_pages`) - Forward to daemon ### Phase 5: Complex Search Endpoint - **Add folder parameter to `search`** - Forward to daemon with folder specification - **Daemon handles model switching** - Load correct model per folder in daemon - **Daemon handles state validation** - Only search active folders - **Performance optimization** - Model caching and efficient switching in daemon ### Phase 6: End-to-End Integration (Built Throughout) - **Daemon API design** - RESTful/WebSocket API for all MCP operations - **Error handling** - Proper error propagation from daemon to MCP clients - **Performance monitoring** - Track MCP→Daemon→Response latency - **Protocol compliance** - Ensure all MCP responses meet specification --- ## 🧪 **Verified TMOAT Methodology** *Each implementation phase MUST be verified through subagent testing before moving to the next phase.* ### **Testing Infrastructure Setup** **Step 1: Start Daemon + MCP Server Architecture** ```bash # Build the project npm run build # 1. Start daemon first (manages all folders) npm run daemon # Daemon reads folders from ~/.folder-mcp/config.yaml # 2. Configure Claude Code to use folder-mcp (NO folder arguments!) # Add to Claude Code config: { "mcpServers": { "folder-mcp": { "command": "node", "args": [ "/Users/hanan/Projects/folder-mcp/dist/mcp-server.js" // ✅ NO folder paths! MCP server connects to daemon ] } } } # 3. MCP server automatically connects to daemon via WebSocket # All folder configuration managed by daemon ``` **Step 2: Configure Test Folders in Daemon** ```bash # Configure daemon with test folders (not MCP server!) folder-mcp config set folders.list '[ { "path": "tests/fixtures/test-knowledge-base/Sales", "model": "all-MiniLM-L6-v2" }, { "path": "tests/fixtures/test-knowledge-base/Legal", "model": "all-MiniLM-L6-v2" } ]' # Daemon immediately picks up config changes # All MCP clients automatically see new folders ``` **Step 3: Create Testing Subagent** ```typescript // Use Task tool to create a subagent with MCP access const testAgent = await Task({ description: "MCP Endpoint Tester", prompt: "You are testing the folder-mcp endpoints. Use the MCP tools to query the system.", subagent_type: "general-purpose" }); ``` **Step 4: Verification Pattern** - **Act as User**: You know what's in daemon config and test fixtures - **Ask Known Questions**: Query for information you already know exists - **Verify Responses**: Check if agent gets correct answers via MCP→Daemon flow ### **Phase 1 TMOAT: Architecture Connection Verification** **Test Scenario 1: MCP Server Connects to Daemon** ``` USER (You): "Check if the MCP server is working" EXPECTED: Agent calls get_status() VERIFY: - MCP server connects to daemon successfully - No "connection refused" errors - Response comes from daemon (not direct file access) ``` **Test Scenario 2: Daemon Configuration Visibility** ``` USER: "What folders are available?" EXPECTED: Agent calls get_status() or get_folder_info() VERIFY: - Response shows folders from daemon config - NOT hardcoded folder paths from MCP args - Daemon manages folder list ``` ### **Phase 2 TMOAT: Status Endpoint Verification** **Test Scenario 1: Basic System Status** ``` USER (You): "Check the MCP server status" EXPECTED: Agent calls get_status() VERIFY: Response includes systemStatus, totalFolders, activeFolders ``` **Test Scenario 2: Multi-Folder Awareness** ``` USER: "How many folders are configured in the system?" EXPECTED: Agent calls get_status() VERIFY: - totalFolders matches actual config count - activeFolders shows correctly indexed folders - errorFolders shows any failed folders ``` **Known Test Data** (from fixtures): - Folders: `Sales/`, `Engineering/`, `Legal/` - Total documents: ~15-20 files - File types: PDF, DOCX, XLSX, PPTX ### **Phase 3 TMOAT: Folder Discovery (via Daemon)** **Test Scenario 1: List All Folders from Daemon** ``` USER: "What folders are available to search?" EXPECTED: Agent calls get_folder_info() VERIFY: - MCP Server → Daemon API → Folder list - Returns Sales, Legal folders from daemon config - Each folder shows document count from daemon - Topics are displayed for each folder (from daemon's topic detection) ``` **Test Scenario 2: Document Listing via Daemon** ``` USER: "Show me all documents in the Sales folder" EXPECTED: Agent calls list_documents(folder: "Sales") VERIFY: - MCP Server forwards request to Daemon - Daemon returns Sales folder document list - Lists Sales_Pipeline.xlsx, Q4_Board_Deck.pptx - Shows correct file metadata from daemon ``` ### **Phase 4 TMOAT: Document Retrieval (via Daemon)** **Test Scenario 1: Get Specific Document via Daemon** ``` USER: "Get the outline of Q4_Board_Deck.pptx" EXPECTED: Agent calls get_document_outline(document_id: "Sales/Q4_Board_Deck.pptx") VERIFY: - MCP Server forwards to Daemon document service - Daemon processes document using its file parsing service - Returns slide count (45 slides) from daemon - Shows slide titles from daemon processing - File size is correct from daemon metadata ``` **Test Scenario 2: Extract Spreadsheet Data via Daemon** ``` USER: "Show me the data from Sales_Pipeline.xlsx" EXPECTED: Agent calls get_sheet_data(document_id: "Sales/Sales_Pipeline.xlsx") VERIFY: - MCP Server → Daemon API → Document processing - Daemon parses spreadsheet using its services - Returns sheet names from daemon - Shows headers: Month, Revenue, Profit from daemon - Contains Q4 data rows processed by daemon ``` ### **Phase 5 TMOAT: Search Operations (via Daemon)** **Test Scenario 1: Semantic Search Within Folder via Daemon** ``` USER: "Search for Q4 revenue information in the Sales folder" EXPECTED: Agent calls search(query: "Q4 revenue", folder: "Sales", mode: "semantic") VERIFY: - MCP Server forwards to Daemon search service - Daemon loads correct model for Sales folder - Daemon performs semantic search in Sales folder only - Returns Sales_Pipeline.xlsx as top result from daemon - Returns Q4_Board_Deck.pptx as second result from daemon - Similarity scores > 0.8 from daemon's model ``` **Test Scenario 2: Cross-Folder Awareness (Negative Test) via Daemon** ``` USER: "Search for contracts in the Sales folder" EXPECTED: Agent calls search(query: "contracts", folder: "Sales", mode: "semantic") VERIFY: - MCP Server → Daemon search service - Daemon searches ONLY Sales folder as requested - Returns no results from daemon (contracts are in Legal folder) - Agent suggests searching Legal folder instead ``` **Test Scenario 3: Dynamic Configuration Test** ``` USER: "Add a new folder via TUI, then search it" STEPS: 1. Use TUI to add new folder to daemon config 2. Agent calls get_folder_info() immediately after VERIFY: - New folder appears in MCP client immediately - No need to restart MCP server or update Claude config - Daemon configuration changes propagate to all MCP clients ``` ### **Success Criteria for Each Phase** **Phase 1 Success (Architecture Fix)**: ✅ MCP server starts WITHOUT folder path arguments ✅ WebSocket connection established between MCP server and daemon ✅ MCP calls successfully forwarded to daemon ✅ Daemon responses properly returned to MCP clients ✅ UI folder changes immediately visible to MCP clients **Phase 2 Success (Status Endpoint)**: ✅ Subagent successfully connects to MCP server ✅ MCP server forwards get_status calls to daemon ✅ Daemon returns multi-folder information via MCP server ✅ No protocol errors or crashes ✅ Response times < 500ms for MCP→Daemon→Response **Phase 3 Success (Folder Discovery)**: ✅ Agent discovers all folders configured in daemon ✅ Document counts are accurate from daemon ✅ Topics reflect actual folder content from daemon's detection ✅ Can list documents in specific folders via daemon **Phase 4 Success (Document Retrieval)**: ✅ Document outlines return correct structure from daemon ✅ Spreadsheet data is accessible via daemon processing ✅ PDF/PPTX metadata is accurate from daemon parsing ✅ Folder context included in responses from daemon **Phase 5 Success (Search Operations)**: ✅ Semantic search returns relevant results from daemon ✅ Folder filtering works correctly in daemon ✅ Model switching happens transparently in daemon ✅ Performance < 2s for MCP→Daemon→Search→Response ✅ Dynamic folder configuration works without MCP server restart ### **TMOAT Execution Commands** ```bash # 1. Build and start daemon + MCP server architecture npm run build # 2. Start daemon first (manages all folders) npm run daemon # Daemon loads config from ~/.folder-mcp/config.yaml # 3. Configure test folders in daemon folder-mcp config set folders.list '[ { "path": "tests/fixtures/test-knowledge-base/Sales", "model": "all-MiniLM-L6-v2" } ]' # 4. Start MCP server (connects to daemon, NO folder arguments) node dist/mcp-server.js # MCP server connects to daemon via WebSocket # 5. Launch subagent tester npm run test:mcp-agent # 6. Monitor daemon and MCP communication tail -f logs/daemon.log tail -f logs/mcp-server.log # 7. Verify each phase before proceeding npm run verify:phase1 # Architecture connection works npm run verify:phase2 # Status endpoint via daemon works npm run verify:phase3 # Folder discovery via daemon works # etc. ``` ### **Red Flags to Watch For** **Phase 1 Architecture Issues**: ⚠️ **Connection Failures**: MCP server can't connect to daemon WebSocket ⚠️ **Direct File Access**: MCP server bypassing daemon (old behavior) ⚠️ **Hardcoded Paths**: MCP server still expecting folder arguments ⚠️ **Port Conflicts**: Daemon and MCP server WebSocket port issues **Phase 2+ Endpoint Issues**: ⚠️ **Protocol Violations**: Invalid JSON-RPC responses from MCP↔Daemon ⚠️ **Missing Folder Context**: Responses without folder attribution from daemon ⚠️ **Wrong Model Usage**: Daemon using single model for all folders ⚠️ **State Confusion**: Daemon allowing searches on inactive/error folders ⚠️ **Performance Issues**: MCP→Daemon→Response > 5 seconds ⚠️ **Memory Leaks**: Growing memory in daemon with repeated MCP calls ⚠️ **Config Sync Issues**: UI changes not propagating to MCP clients ⚠️ **Daemon Restart Required**: Configuration changes requiring daemon restarts --- ## 🧠 **Path-Aware Topic Detection Approach** *This clustering enhancement is a prerequisite that must be implemented first to enable rich folder metadata for the endpoints.* ### Key Design Decisions **Subfolder Intelligence**: - Documents in same subfolder get 20% similarity boost during clustering - Subfolder names weighted 5x in topic term extraction (e.g., "Algebra/" becomes primary cluster term) - No database hierarchy storage - extract subfolder from document paths on-demand **File Name Pattern Recognition**: - Detect sequences (`HW1`, `HW2`, `HW3`), prefixes (`Algebra_`, `Statistics_`), categories (`exam`, `lab`, `notes`) - Sequential files get 10% similarity boost, same prefix gets 20% boost - Common file patterns become cluster terms with high weight **Database-Powered Evolution**: - Cache cluster centroids and term frequencies for fast incremental updates (~50ms classification) - New topic detection: documents with <40% similarity queue for new cluster creation - Quality monitoring: detect cluster drift, trigger recomputation when coherence drops below 0.7 - Evolution tracking: log when clusters split, merge, or adapt to content changes **Performance Characteristics**: - Document classification: <100ms using cached centroids - New cluster creation: <5s from 5+ queued documents - Topic retrieval: <50ms from database cache - Full reclustering only when >20% of documents change or quality degrades ### Why This Matters for Endpoints This approach generates **intelligent folder topics** that the `get_folder_info` endpoint will return to help LLMs make smart folder selection decisions. Topics like `["algebra", "quadratic", "homework"]` are much more useful than generic content clusters. --- ## ⚠️ **Prerequisites** 1. **Task 11.5 Complete**: Multi-folder indexing system working 2. **Path-Aware Topic Detection**: Implement clustering approach above 3. **Database Schema**: Cluster centroids, topic storage, and evolution tracking available 4. **Model Registry**: System tracks which model each folder uses --- ## 🎯 **Definition of Done** - [ ] All 10 MCP endpoints support multi-folder operations - [ ] Folder parameter added to search and document endpoints - [ ] State validation prevents operations on unavailable folders - [ ] Model loading works automatically per folder - [ ] Responses include comprehensive folder context - [ ] Error messages are folder-aware and actionable - [ ] Performance meets targets with model switching - [ ] LLMs can successfully navigate multi-folder scenarios - [ ] All existing single-folder workflows still work --- **Focus**: Transform existing endpoints to be folder-aware **Deliverable**: Updated MCP endpoints that work seamlessly with multi-folder system **Impact**: Enable LLMs to intelligently work with multiple knowledge bases

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/okets/folder-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Phase-9-PRD-MCP-Endpoints-Multi-Folder-Support.md•23.7 KiB