Chat Context MCP

PHASE-1-CORE.md•16.4 kB

# Phase 1: Core Library - Detailed Roadmap ## Objective Build the foundational library for reading Cursor's database, extracting sessions, parsing messages, and managing metadata. This is the most critical phase - everything else builds on this. ## Deliverables - ✅ Working TypeScript/Node.js core library - ✅ Read Cursor database (read-only, safe) - ✅ Extract session list with metadata - ✅ Parse messages (user + assistant) - ✅ Extract workspace paths from tool results - ✅ Metadata database for nicknames/tags - ✅ Unit tests for all core functions ## Technology Stack - **Language:** TypeScript (for type safety + Node.js ecosystem) - **Database:** better-sqlite3 (fast, synchronous SQLite) - **Testing:** Jest or Vitest - **Build:** tsx for development, tsc for production --- ## Task Breakdown ### 1. Project Setup (5 tasks) - [ ] **1.1** Initialize Node.js project with TypeScript - Create `package.json` - Install dependencies: `better-sqlite3`, TypeScript, types - Configure `tsconfig.json` - Set up directory structure (`src/core/`) - **AC:** `npm run build` compiles successfully - [ ] **1.2** Set up development tooling - Install `tsx` for fast dev execution - Configure ESLint + Prettier - Add `package.json` scripts: `dev`, `build`, `test` - **AC:** Can run `npm run dev src/core/test.ts` - [ ] **1.3** Set up testing framework - Install Jest or Vitest - Configure test runner - Create `tests/` directory structure - Add test script to package.json - **AC:** `npm test` runs (even with 0 tests) - [ ] **1.4** Create core module structure - `src/core/cursor-db.ts` (placeholder) - `src/core/metadata-db.ts` (placeholder) - `src/core/message-parser.ts` (placeholder) - `src/core/workspace-extractor.ts` (placeholder) - `src/core/types.ts` (shared types) - **AC:** All files import without errors - [ ] **1.5** Add platform detection utility - Create `src/core/platform.ts` - Implement `getCursorDBPath()` for macOS/Windows/Linux - Add validation (check if file exists) - **AC:** Returns correct path on current platform --- ### 2. Cursor Database Access (8 tasks) - [ ] **2.1** Implement safe database connection - Create `CursorDB` class in `cursor-db.ts` - Open database with `readonly: true` - Add WAL mode detection - Handle SQLITE_BUSY with retry logic (max 3 attempts) - **AC:** Can connect to actual Cursor DB without errors - [ ] **2.2** Add error handling for DB access - Custom error types: `DBConnectionError`, `DBLockedError` - Try-catch wrapper with meaningful messages - Graceful degradation if DB unavailable - **AC:** Clean error messages when DB is locked/missing - [ ] **2.3** Query composer sessions - Method: `listComposerIds(limit?: number): string[]` - Query: `SELECT key FROM cursorDiskKV WHERE key LIKE 'composerData:%'` - Extract UUID from key - Order by key (most recent first) - **AC:** Returns list of composer UUIDs - [ ] **2.4** Fetch composer data - Method: `getComposerData(composerId: string): ComposerData | null` - Query specific composerData entry - Parse JSON value - Handle missing/corrupted data - **AC:** Returns parsed composer object - [ ] **2.5** Fetch bubble (message) data - Method: `getBubbleData(composerId: string, bubbleId: string): BubbleData | null` - Query: `SELECT value FROM cursorDiskKV WHERE key = ?` - Key format: `bubbleId:{composerId}:{bubbleId}` - Parse JSON value - **AC:** Returns parsed bubble object - [ ] **2.6** Fetch all bubbles for a session - Method: `getSessionBubbles(composerId: string): BubbleData[]` - Get bubbleIds from composer conversation/fullConversationHeadersOnly - Fetch each bubble - Maintain order - **AC:** Returns array of all messages in order - [ ] **2.7** Add connection pooling/caching - Singleton pattern for DB connection - Cache frequently accessed sessions - LRU cache with size limit (50 sessions max) - **AC:** Same session fetched twice = cache hit - [ ] **2.8** Write tests for database access - Mock SQLite database for tests - Test connection success/failure - Test query methods - Test error handling - **AC:** All DB tests pass --- ### 3. TypeScript Types & Interfaces (4 tasks) - [ ] **3.1** Define core types - Create `src/core/types.ts` - `ComposerData` interface (based on actual schema) - `BubbleData` interface (type 1 vs type 2) - `ConversationHeader` interface - **AC:** Types match actual Cursor DB structure - [ ] **3.2** Define metadata types - `SessionMetadata` interface - `SessionWithMessages` interface (full session) - `MessageParsed` interface (user/assistant/tool) - **AC:** Types support all planned features - [ ] **3.3** Define search/filter types - `SearchOptions` interface (query, project filter, limit) - `ListOptions` interface (project, tagged_only, limit) - `ProjectFilter` type: `'current' | 'all' | string` - **AC:** All API functions use these types - [ ] **3.4** Add JSDoc documentation - Document all public interfaces - Add examples to key types - Explain non-obvious fields - **AC:** Hover in IDE shows helpful docs --- ### 4. Workspace Path Extraction (6 tasks) - [ ] **4.1** Implement tool result parser - Function: `parseToolResult(bubble: BubbleData): WorkspaceResult | null` - Check for `toolFormerData` field - Parse `result` JSON string - Extract `workspaceResults` keys - **AC:** Extracts workspace path from grep/read_file results - [ ] **4.2** Implement workspace extractor - Function: `extractWorkspacePath(composerId: string): string | null` - Iterate through session bubbles - Find first tool result with workspace - Return absolute path - **AC:** Returns `/Users/...` path or null - [ ] **4.3** Derive project name from path - Function: `getProjectName(workspacePath: string): string` - Extract last folder name - Handle edge cases (root, trailing slash) - **AC:** `/Users/me/projects/my-app` → `my-app` - [ ] **4.4** Handle multi-workspace sessions - Track all unique workspace paths in session - Return primary (first/most frequent) - Store others in metadata (optional) - **AC:** Sessions with multiple workspaces handled - [ ] **4.5** Detect sessions without projects - Sessions with no tool results = no workspace - Flag as `has_project = false` - Label as "General" conversation - **AC:** Correctly identifies project-less sessions - [ ] **4.6** Write tests for workspace extraction - Mock bubbles with tool results - Test various tool types (grep, read_file, etc.) - Test missing workspace - Test multi-workspace - **AC:** All workspace extraction tests pass --- ### 5. Message Parsing (10 tasks) - [ ] **5.1** Parse user messages (type 1) - Extract `richText` field - Parse Lexical JSON format - Extract plain text content - **AC:** User message text extracted correctly - [ ] **5.2** Parse assistant messages (type 2) - Extract `text` field (plain text) - Handle empty text (tool-only responses) - Preserve formatting - **AC:** Assistant message text extracted - [ ] **5.3** Parse Lexical richText format - Function: `parseLexicalText(richTextJson: string): string` - Traverse nested `children` nodes - Extract text from each node - Handle all node types (paragraph, text, code) - **AC:** Converts rich text to plain text - [ ] **5.4** Handle code blocks in messages - Detect code nodes in richText - Extract language and content - Preserve formatting in output - **AC:** Code blocks preserved or marked - [ ] **5.5** Handle mentions (@file, @folder) - Detect mention nodes - Extract referenced files/folders - Include in parsed output metadata - **AC:** Mentions tracked separately - [ ] **5.6** Parse tool calls and results - Extract `toolFormerData` from bubbles - Parse tool name, params, result - Format for display (optional: hide by default) - **AC:** Tool info available in parsed message - [ ] **5.7** Create unified message format - Type: `ParsedMessage` with fields: - `role: 'user' | 'assistant' | 'tool'` - `content: string` - `bubbleId: string` - `timestamp?: string` - `toolData?: ToolInfo` - **AC:** All messages use this format - [ ] **5.8** Handle edge cases - Empty messages - Corrupted JSON in richText - Unknown bubble types - Missing fields - **AC:** Graceful degradation, no crashes - [ ] **5.9** Add message filtering options - Filter out tool results (for cleaner context) - Include only user/assistant exchanges - Configurable in parse options - **AC:** Can get "clean" conversation - [ ] **5.10** Write tests for message parsing - Test user message parsing - Test assistant message parsing - Test richText parsing - Test code blocks, mentions - Test edge cases - **AC:** All parsing tests pass (>90% coverage) --- ### 6. Metadata Database (9 tasks) - [ ] **6.1** Design metadata schema - Review schema in main doc - Finalize fields needed for Phase 1 - Plan indexes for performance - **AC:** Schema documented and agreed - [ ] **6.2** Create metadata database - Create `MetadataDB` class in `metadata-db.ts` - Initialize SQLite DB at `~/.cursor-context/metadata.db` - Create tables with schema - Add indexes - **AC:** DB created on first run - [ ] **6.3** Implement session metadata CRUD - `upsertSessionMetadata(metadata: SessionMetadata): void` - `getSessionMetadata(sessionId: string): SessionMetadata | null` - `deleteSessionMetadata(sessionId: string): void` - **AC:** Can create, read, update, delete metadata - [ ] **6.4** Implement nickname operations - `setNickname(sessionId: string, nickname: string): void` - `getSessionByNickname(nickname: string): SessionMetadata | null` - `listNicknames(): string[]` - Enforce unique nicknames - **AC:** Nickname lookups work, uniqueness enforced - [ ] **6.5** Implement tag operations - `addTag(sessionId: string, tag: string): void` - `removeTag(sessionId: string, tag: string): void` - `findByTag(tag: string): SessionMetadata[]` - `listAllTags(): string[]` - Store tags as JSON array - **AC:** Tag queries work correctly - [ ] **6.6** Implement project filtering - `listSessionsByProject(projectPath: string): SessionMetadata[]` - `listProjects(): ProjectInfo[]` (with session counts) - Filter by project in list/search - **AC:** Can list all sessions for a project - [ ] **6.7** Auto-populate metadata on first access - When session fetched, check if metadata exists - If not, extract: project, first message preview, created date - Store in metadata DB - **AC:** Metadata auto-created lazily - [ ] **6.8** Add migration support - Version metadata DB schema - Add version check on startup - Plan for future schema changes - **AC:** DB version tracked, ready for migrations - [ ] **6.9** Write tests for metadata DB - Test CRUD operations - Test nickname uniqueness - Test tag operations - Test project filtering - Test auto-population - **AC:** All metadata tests pass --- ### 7. Session Formatting (4 tasks) - [ ] **7.1** Implement Markdown formatter - Function: `formatSessionMarkdown(session: SessionWithMessages): string` - Format: header with metadata, then messages - User messages prefixed with `[USER]` - Assistant messages prefixed with `[ASSISTANT]` - **AC:** Readable markdown output - [ ] **7.2** Implement JSON formatter - Function: `formatSessionJSON(session: SessionWithMessages): string` - Pretty-printed JSON - Include all metadata - **AC:** Valid, parseable JSON - [ ] **7.3** Implement compact preview format - For list views: `sessionId | nickname | project | preview (50 chars) | date` - Function: `formatSessionPreview(metadata: SessionMetadata): string` - **AC:** One-line summaries for lists - [ ] **7.4** Add formatting options - Include/exclude tool calls - Include/exclude timestamps - Max messages limit - Message range (e.g., messages 10-20) - **AC:** Configurable output --- ### 8. Core API Interface (5 tasks) - [ ] **8.1** Create main `CursorContext` class - High-level API that combines all modules - Constructor: initialize both DBs - Clean, simple interface for CLI/MCP to use - **AC:** Single entry point for all operations - [ ] **8.2** Implement list sessions API - Method: `listSessions(options: ListOptions): SessionMetadata[]` - Options: project, limit, tagged_only - Returns sorted by date (recent first) - **AC:** List works with all filter options - [ ] **8.3** Implement fetch session API - Method: `fetchSessionById(sessionId: string, options?: FetchOptions): SessionWithMessages` - Method: `fetchSessionByNickname(nickname: string, options?: FetchOptions): SessionWithMessages` - Options: message_limit, include_tools - **AC:** Fetch returns full session with messages - [ ] **8.4** Implement search API - Method: `searchSessions(query: string, options: SearchOptions): SearchResult[]` - Full-text search across message content - Project filtering - Return matches with context (surrounding messages) - **AC:** Search finds relevant sessions - [ ] **8.5** Implement tagging API - Method: `tagSession(sessionId: string, nickname?: string, tags?: string[]): void` - Method: `getSessionId(nickname: string): string | null` - Method: `listTags(): TagInfo[]` (tag + count) - **AC:** Tagging works from core API --- ### 9. Testing & Validation (4 tasks) - [ ] **9.1** Create test fixtures - Mock Cursor DB with sample data - Sample sessions with various formats - Edge cases: empty sessions, corrupted data - **AC:** Reusable test data for all tests - [ ] **9.2** Integration tests - Test full workflow: connect → list → fetch → parse - Test with real Cursor DB (read-only) - Verify performance (1000+ sessions) - **AC:** Integration tests pass - [ ] **9.3** Error handling tests - Test DB locked scenario - Test corrupted data - Test missing sessions - Test invalid inputs - **AC:** All error paths tested - [ ] **9.4** Performance benchmarks - Measure parse time for large session (500+ messages) - Measure search time across 1000+ sessions - Ensure <100ms for common operations - **AC:** Performance is acceptable --- ### 10. Documentation (3 tasks) - [ ] **10.1** Write API documentation - JSDoc for all public methods - Examples for each main function - **AC:** Auto-generated docs look good - [ ] **10.2** Create usage examples - `examples/list-sessions.ts` - `examples/fetch-session.ts` - `examples/search.ts` - **AC:** Examples run and demonstrate features - [ ] **10.3** Update README - Core library overview - Installation instructions - Quick start guide - **AC:** Someone can use the library --- ## Acceptance Criteria for Phase 1 Complete ✅ Can list all Cursor sessions with metadata ✅ Can fetch any session by ID with full messages ✅ Messages are parsed to plain text (user + assistant) ✅ Workspace paths extracted and stored in metadata ✅ Nickname/tag system works ✅ Project filtering works ✅ Search finds sessions by content ✅ All tests pass (>80% coverage) ✅ Performance is acceptable (<100ms for most ops) ✅ Documentation exists and examples work --- ## Dependencies & Blockers **External Dependencies:** - Access to Cursor database (user must have Cursor installed) - Node.js 18+ (for modern TypeScript features) **Internal Dependencies:** - Task 2.x blocks 4.x (need DB access for workspace extraction) - Task 5.x blocks 8.x (need parsing for fetch API) - Task 6.x blocks 8.x (need metadata for list/search API) **No Blockers:** Can start immediately with task 1.1 --- ## Estimated Timeline - **Setup (Section 1, 3):** 1-2 days - **DB Access (Section 2):** 2-3 days - **Workspace Extraction (Section 4):** 1-2 days - **Message Parsing (Section 5):** 3-4 days (most complex) - **Metadata DB (Section 6):** 2-3 days - **Formatting & API (Section 7, 8):** 2-3 days - **Testing & Docs (Section 9, 10):** 2-3 days **Total: ~2-3 weeks of focused work** --- ## Next Steps 1. ✅ Create this roadmap 2. **Start with Task 1.1** - Initialize project 3. Work through tasks sequentially 4. Mark tasks complete as we go 5. Adjust roadmap based on learnings Let's build this! 🚀

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aolshaun/chat-context-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server