MCP Standards

MIT License

Overview InspectNew Endpoints Schema Related Servers Reviews Score

mcp-standards
docs

SELF-LEARNING-PRD.md•27 kB

# Universal Learning System for AI Coding Assistants - PRD **Project**: MCP Standards - Self-Learning AI Standards System **Version**: 0.1.0 **Status**: Active Development **Last Updated**: 2025-10-14 --- ## Executive Summary MCP Standards is a **universal learning system** that makes ALL AI coding assistants progressively smarter through automatic knowledge capture, pattern learning, and intelligent config management. ### The Problem Current state of AI coding assistant interactions: - ❌ Repetitive instructions every session ("use • bullets not -") - ❌ No automatic learning from corrections - ❌ Manual config file maintenance (CLAUDE.md, .cursorrules, etc.) - ❌ Each AI tool requires separate configuration - ❌ No pattern detection across tools - ❌ Context loss between sessions - ❌ No cross-tool knowledge sharing ### The Solution A universal learning system that: - ✅ **Works across ALL AI coding tools** (Claude, Cursor, Copilot, Windsurf, Cline) - ✅ Automatically captures corrections and patterns - ✅ Learns from 5 detection layers (explicit, implicit, violations, behavioral, semantic) - ✅ Updates ALL relevant config files intelligently - ✅ Shares knowledge across tools (optional) - ✅ Tracks agent performance and suggests improvements - ✅ Validates deliverables against specs - ✅ Optimizes workflows based on patterns ### Scope (Focused) **In Scope:** - ✅ Claude Desktop / Claude Code (primary, native MCP support) - ✅ Claude Projects (export-based integration, manual upload) - ✅ Cross-tool config updates (Cursor, Copilot, Windsurf, Cline) - optional - ✅ Universal config discovery and parsing - ✅ Pattern learning with 5-layer detection **Out of Scope (For Now):** - ❌ ChatGPT Desktop integration (deferred to future release) - ❌ Real-time Claude Projects API sync (waiting for Anthropic API) - ❌ JetBrains IDEs (different architecture) - ❌ VS Code extensions (except those that read config files) --- ## Vision Statement **"ALL AI coding assistants should get smarter with every interaction, learning your preferences, project patterns, and workflow optimizations automatically - eliminating repetitive instructions across all tools and making AI assistance genuinely universal and intelligent."** --- ## Core Principles 1. **Evidence > Assumptions**: Learn from actual user behavior, not guesses 2. **Automatic > Manual**: Capture knowledge without user intervention 3. **Local-First**: All data stays on user's machine 4. **Transparent**: User can inspect, edit, and control all learned knowledge 5. **Non-Intrusive**: Never block or slow down Claude Code 6. **Incremental**: Learn and improve gradually over time --- ## Success Metrics ### Week 1 (Universal Config Discovery) - [ ] Successfully reads and parses all 7+ AI assistant config formats - [ ] Priority-based config merging working correctly - [ ] Can detect CLAUDE.md hierarchy (user → parent → project → nested) - [ ] Config validation prevents malformed updates - [ ] Comprehensive test coverage for all parsers ### Week 2 (Enhanced Pattern Learning) - [ ] 5-layer pattern detection operational - [ ] Explicit corrections captured with 95%+ accuracy - [ ] Implicit rejections detected (user edits within 2 min) - [ ] Rule violations detected by comparing vs config files - [ ] Behavioral patterns tracked across sessions - [ ] Semantic clustering groups similar corrections ### Week 3 (Auto-Update Orchestration) - [ ] Event-driven CLAUDE.md updates working - [ ] Smart file selection (project vs global) accurate - [ ] Cross-tool config updates optional and safe - [ ] Backup-first atomic writes never lose data - [ ] User notifications via MCP functional - [ ] 3+ automatic pattern promotions in real usage ### Week 4 (Integration & Documentation) - [ ] Claude Projects export format validated - [ ] Integration guide complete with examples - [ ] Config standards doc comprehensive - [ ] Test coverage >80% for all components - [ ] Video tutorial recorded - [ ] 80% reduction in repeated instructions (real-world test) ### Long-term (3 months) - [ ] 95%+ accuracy in preference application - [ ] Zero repeated corrections for learned patterns - [ ] Autonomous workflow optimization - [ ] Cross-project pattern sharing --- ## User Stories ### As a Developer **Story 1: Tool Preference Learning** ``` GIVEN I repeatedly correct Claude to use "uv" instead of "pip" WHEN the third correction occurs THEN the system automatically: - Adds "use uv not pip for Python package management" to CLAUDE.md - Applies this preference in future sessions without prompting - Tracks confidence level based on repetition ``` **Story 2: Agent Performance Tracking** ``` GIVEN I use multiple agents for similar tasks WHEN some agents consistently need corrections THEN the system: - Tracks success/failure rates per agent - Suggests better-performing agents for similar tasks - Updates my preferences automatically ``` **Story 3: Validation Gates** ``` GIVEN I finish implementing a feature WHEN I mark the task complete THEN the system automatically: - Compares deliverable vs original spec from memory - Checks if tests were run - Verifies documentation updates - Prompts for missing items before allowing completion ``` **Story 4: CLAUDE.md Management** ``` GIVEN I work on multiple projects WHEN I make project-specific corrections THEN the system: - Updates local CLAUDE.md for project-specific patterns - Promotes to global CLAUDE.md when pattern repeats across projects - Maintains both files automatically ``` **Story 5: Workflow Optimization** ``` GIVEN I always run tests after code changes WHEN this pattern repeats 5+ times THEN the system: - Adds automatic test-running to post-code-change workflow - Asks once: "Always run tests after code changes?" - Remembers and applies the answer going forward ``` **Story 6: Universal Learning Across Tools** ``` GIVEN I use Claude Desktop, Cursor, and GitHub Copilot WHEN I correct Claude: "Use • bullets not - for resume formatting" THEN the system automatically: - Updates CLAUDE.md (Claude Desktop) - Updates .cursor/rules/learned.mdc (Cursor) [if enabled] - Updates .github/copilot-instructions.md (Copilot) [if enabled] - Notifies me of all updated files - Creates timestamped backups - All tools now respect the same preference ``` **Story 7: Resume Bullet Formatting (Real Example)** ``` GIVEN I repeatedly correct resume bullet formatting WHEN I say "Use • bullets not -" three times THEN the system: - Detects pattern via explicit correction layer - Promotes to high-confidence preference (95%+) - Updates CLAUDE.md with formatting rule - Optionally updates .editorconfig for cross-tool standard - Never needs to be told again across all projects ``` --- ## Technical Architecture ### Component Overview ``` ┌─────────────────────────────────────────────────────────┐ │ Claude Code │ └────────────────┬────────────────────────────────────────┘ │ │ PostToolUse Hook ▼ ┌─────────────────────────────────────────────────────────┐ │ Hooks Manager │ │ • capture_hook.py (entry point) │ │ • significance_scorer.py (scoring) │ │ • pattern_extractor.py (learning) │ └────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Intelligence Layer │ │ • temporal_graph.py (knowledge tracking) │ │ • pattern_learner.py (correction detection) │ │ • claudemd_manager.py (file management) │ │ • validation_engine.py (quality gates) │ │ • agent_tracker.py (performance analytics) │ └────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Enhanced MCP Server │ │ • New tools: learn_preference, validate_spec │ │ • Automatic pattern suggestions │ │ • Agent performance queries │ │ • CLAUDE.md generation │ └────────────────┬────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ SQLite Database │ │ • tool_executions (raw captures) │ │ • tool_preferences (learned rules) │ │ • agent_performance (success tracking) │ │ • validation_gates (quality checks) │ │ • pattern_frequency (repetition tracking) │ └─────────────────────────────────────────────────────────┘ ``` ### Data Flow #### 1. Automatic Capture Flow ``` Tool Execution ↓ PostToolUse Hook Triggered ↓ Significance Scoring (0.0-1.0) ↓ If score ≥ 0.3: Store in DB ↓ If score ≥ 0.6: Extract Patterns ↓ If pattern repeats 3+: Update CLAUDE.md ``` #### 2. Learning Flow ``` User Correction ↓ Pattern Extractor Detects Correction ↓ Check Repetition Frequency ↓ If 3+ occurrences: ↓ ├─ Add to tool_preferences ├─ Update CLAUDE.md └─ Apply in future sessions ``` #### 3. Validation Flow ``` Task Completion ↓ Load Original Spec from Memory ↓ Compare Deliverable vs Spec ↓ Check Quality Gates (tests, docs) ↓ If gaps detected: ↓ ├─ Prevent completion ├─ Show missing items └─ Learn from resolution ``` --- ## Database Schema ### Extended Schema Design ```sql -- Tool executions (raw capture) CREATE TABLE tool_executions ( id INTEGER PRIMARY KEY, tool_name TEXT NOT NULL, args TEXT NOT NULL, -- JSON result TEXT, -- JSON significance REAL NOT NULL, project_path TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP, session_id TEXT ); CREATE INDEX idx_tool_exec_sig ON tool_executions(significance DESC); CREATE INDEX idx_tool_exec_time ON tool_executions(timestamp DESC); -- Learned preferences CREATE TABLE tool_preferences ( id INTEGER PRIMARY KEY, category TEXT NOT NULL, -- "python-package", "git-workflow", etc context TEXT NOT NULL, -- When to apply preference TEXT NOT NULL, -- The actual rule confidence REAL DEFAULT 0.5, -- 0.0-1.0 based on repetition examples TEXT, -- JSON array of example executions learned_from TEXT, -- Episode IDs that led to learning last_validated TIMESTAMP, last_applied TIMESTAMP, apply_count INTEGER DEFAULT 0, project_specific BOOLEAN DEFAULT FALSE, project_path TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_pref_confidence ON tool_preferences(confidence DESC); CREATE INDEX idx_pref_context ON tool_preferences(context); -- Agent performance tracking CREATE TABLE agent_performance ( id INTEGER PRIMARY KEY, agent_type TEXT NOT NULL, task_category TEXT, -- "code-review", "debugging", etc success BOOLEAN, correction_count INTEGER DEFAULT 0, execution_time REAL, -- seconds context TEXT, -- JSON with task details project_path TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_agent_perf_type ON agent_performance(agent_type, success); CREATE INDEX idx_agent_perf_time ON agent_performance(timestamp DESC); -- Validation gates CREATE TABLE validation_gates ( id INTEGER PRIMARY KEY, project_path TEXT NOT NULL, gate_type TEXT NOT NULL, -- "tests", "docs", "spec", "build" required BOOLEAN DEFAULT TRUE, auto_check BOOLEAN DEFAULT TRUE, check_command TEXT, -- Shell command to verify failure_count INTEGER DEFAULT 0, last_checked TIMESTAMP, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE UNIQUE INDEX idx_gate_project_type ON validation_gates(project_path, gate_type); -- Pattern frequency (repetition tracking) CREATE TABLE pattern_frequency ( id INTEGER PRIMARY KEY, pattern_key TEXT NOT NULL UNIQUE, -- "tool:args_fingerprint" tool_name TEXT NOT NULL, pattern_type TEXT, -- "correction", "workflow", "preference" occurrence_count INTEGER DEFAULT 1, first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP, promoted_to_preference BOOLEAN DEFAULT FALSE, confidence REAL DEFAULT 0.0 ); CREATE INDEX idx_pattern_freq ON pattern_frequency(occurrence_count DESC); CREATE INDEX idx_pattern_tool ON pattern_frequency(tool_name); -- Temporal knowledge graph (for preference evolution) CREATE TABLE knowledge_evolution ( id INTEGER PRIMARY KEY, preference_id INTEGER, change_type TEXT, -- "created", "updated", "deprecated" old_value TEXT, new_value TEXT, reason TEXT, event_time TIMESTAMP, -- When the event actually occurred record_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- When we recorded it FOREIGN KEY(preference_id) REFERENCES tool_preferences(id) ); CREATE INDEX idx_knowledge_evo_pref ON knowledge_evolution(preference_id); CREATE INDEX idx_knowledge_evo_time ON knowledge_evolution(event_time); -- Spec validation tracking CREATE TABLE spec_validations ( id INTEGER PRIMARY KEY, task_description TEXT NOT NULL, spec_captured TEXT NOT NULL, -- Original specification deliverable_summary TEXT, -- What was actually built validation_status TEXT, -- "matched", "gaps", "exceeded" gaps_found TEXT, -- JSON array of missing items project_path TEXT, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_spec_val_status ON spec_validations(validation_status); ``` --- ## Implementation Phases ### Phase 1: Foundation (Week 1) **Objective**: Get hooks working and capturing tool executions #### Deliverables 1. **Hooks Manager** (`hooks/`) - [x] `capture_hook.py` - Main entry point - [x] `significance_scorer.py` - Scoring system - [ ] `pattern_extractor.py` - Pattern detection - [ ] Hook configuration for Claude Code 2. **Database Schema** - [ ] Extend existing schema with new tables - [ ] Migration script for existing data - [ ] Indexes for performance 3. **Testing & Validation** - [ ] Unit tests for scorer - [ ] Integration tests for hooks - [ ] Validation that hooks don't block Claude #### Success Criteria - PostToolUse hook captures 80%+ significant executions - Significance scoring accurately identifies important actions - No performance impact on Claude Code - Database correctly stores captured data --- ### Phase 2: Intelligence (Week 2) **Objective**: Build learning and pattern extraction #### Deliverables 1. **Intelligence Layer** (`intelligence/`) - [ ] `temporal_graph.py` - Preference evolution tracking - [ ] `pattern_learner.py` - Correction detection - [ ] `claudemd_manager.py` - File management - [ ] `agent_tracker.py` - Performance analytics 2. **Pattern Extraction** - [ ] Detect repeated corrections - [ ] Extract tool preferences - [ ] Track agent performance - [ ] Build confidence scoring 3. **CLAUDE.md Management** - [ ] Auto-generate updates - [ ] Merge with existing content - [ ] Project vs global distinction - [ ] Template system #### Success Criteria - Correctly identifies repeated patterns (3+ occurrences) - Auto-generates CLAUDE.md updates for learned preferences - Tracks agent performance with 90%+ accuracy - Temporal graph captures preference evolution --- ### Phase 3: Validation (Week 3) **Objective**: Implement quality gates and spec validation #### Deliverables 1. **Validation Engine** (`intelligence/validation_engine.py`) - [ ] Spec capture and storage - [ ] Deliverable comparison - [ ] Gap detection - [ ] Quality gate checking 2. **Quality Gates** - [ ] Test execution verification - [ ] Documentation update checking - [ ] Build success validation - [ ] Custom gate support 3. **MCP Tool Integration** - [ ] `validate_spec` tool - [ ] `check_quality_gates` tool - [ ] `get_agent_suggestions` tool #### Success Criteria - Spec validation catches 90%+ gaps - Quality gates prevent incomplete work - User can define custom validation rules - Agent suggestions based on historical data --- ### Phase 4: Optimization (Week 4) **Objective**: Proactive intelligence and workflow optimization #### Deliverables 1. **Proactive Systems** - [ ] Workflow pattern detection - [ ] Automatic suggestion generation - [ ] Context-aware preferences - [ ] Cross-project learning 2. **MCP Tools** - [ ] `learn_preference` - Manual preference addition - [ ] `suggest_claudemd_update` - Show suggested updates - [ ] `get_workflow_optimization` - Workflow suggestions - [ ] `query_agent_performance` - Performance analytics 3. **Intelligence Features** - [ ] Proactive workflow suggestions - [ ] Context-aware agent selection - [ ] Auto-apply high-confidence preferences - [ ] Learn from validation failures #### Success Criteria - Proactively suggests workflow improvements - Context-aware preferences applied correctly - Agent suggestions improve task completion - 80%+ reduction in repeated instructions --- ## MCP Tool Specifications ### New Tools #### 1. `learn_preference` ```json { "name": "learn_preference", "description": "Manually add a learned preference", "inputSchema": { "type": "object", "properties": { "category": {"type": "string", "description": "Category (e.g., 'python-package')"}, "context": {"type": "string", "description": "When to apply"}, "preference": {"type": "string", "description": "The rule to apply"}, "project_specific": {"type": "boolean", "default": false} }, "required": ["category", "context", "preference"] } } ``` #### 2. `suggest_claudemd_update` ```json { "name": "suggest_claudemd_update", "description": "Get suggested CLAUDE.md updates based on learned patterns", "inputSchema": { "type": "object", "properties": { "project_path": {"type": "string", "description": "Project path (optional)"}, "min_confidence": {"type": "number", "default": 0.7} } } } ``` #### 3. `validate_spec` ```json { "name": "validate_spec", "description": "Validate deliverable against original specification", "inputSchema": { "type": "object", "properties": { "task_description": {"type": "string", "description": "Original task"}, "deliverable_summary": {"type": "string", "description": "What was built"} }, "required": ["task_description", "deliverable_summary"] } } ``` #### 4. `check_quality_gates` ```json { "name": "check_quality_gates", "description": "Check if quality gates are satisfied", "inputSchema": { "type": "object", "properties": { "project_path": {"type": "string"}, "gate_types": {"type": "array", "items": {"type": "string"}} }, "required": ["project_path"] } } ``` #### 5. `get_agent_suggestions` ```json { "name": "get_agent_suggestions", "description": "Get agent recommendations based on task and performance history", "inputSchema": { "type": "object", "properties": { "task_category": {"type": "string", "description": "Type of task"}, "context": {"type": "string", "description": "Additional context"} }, "required": ["task_category"] } } ``` #### 6. `query_agent_performance` ```json { "name": "query_agent_performance", "description": "Query agent performance statistics", "inputSchema": { "type": "object", "properties": { "agent_type": {"type": "string", "description": "Agent to query (optional)"}, "task_category": {"type": "string", "description": "Task type (optional)"}, "time_range": {"type": "string", "description": "Time range (e.g., 'last_week')"} } } } ``` --- ## Configuration ### Claude Code Hooks Configuration Create/update `~/.claude/hooks.json`: ```json { "hooks": { "PostToolUse": [ { "matcher": "*", "hooks": [ { "type": "command", "command": "/path/to/mcp-standards/hooks/capture_hook.py", "timeout": 5 } ] } ] } } ``` ### MCP Server Configuration Update `.mcp.json`: ```json { "mcpServers": { "mcp-standards": { "command": "uv", "args": [ "--directory", "/path/to/mcp-standards", "run", "run_server.py" ], "env": { "AUTO_LEARNING": "true", "SUGGEST_CLAUDEMD": "true", "VALIDATION_GATES": "tests,docs,spec" } } } } ``` --- ## Testing Strategy ### Unit Tests - [ ] Significance scorer edge cases - [ ] Pattern extractor correctness - [ ] CLAUDE.md manager merging logic - [ ] Validation engine gap detection - [ ] Agent performance calculations ### Integration Tests - [ ] Hook → Database → MCP server flow - [ ] Pattern learning across multiple executions - [ ] CLAUDE.md file generation and updates - [ ] Quality gate enforcement - [ ] Cross-project pattern promotion ### System Tests - [ ] Full workflow: capture → learn → apply - [ ] Performance impact on Claude Code - [ ] Multi-project learning - [ ] Temporal graph consistency - [ ] Agent recommendation accuracy --- ## Migration Plan ### From Current System 1. **Database Migration** - Extend existing schema (non-breaking) - Migrate existing episodes to new structure - Add indexes for new tables 2. **Backwards Compatibility** - Existing MCP tools work unchanged - New features opt-in via configuration - Graceful fallback if hooks unavailable 3. **User Communication** - Clear documentation of new features - Migration guide for existing users - Opt-out mechanism for auto-learning --- ## Risk Mitigation ### Potential Risks 1. **Performance Impact** - **Risk**: Hooks slow down Claude Code - **Mitigation**: 5s timeout, async processing, significance filtering - **Monitoring**: Log hook execution times 2. **False Learning** - **Risk**: System learns incorrect patterns - **Mitigation**: Confidence thresholds, manual override, transparent logs - **Monitoring**: Track preference application failures 3. **Privacy Concerns** - **Risk**: Sensitive data captured in logs - **Mitigation**: Local-only storage, sensitive file filtering, clear docs - **Monitoring**: User can inspect and delete data 4. **CLAUDE.md Conflicts** - **Risk**: Auto-updates conflict with manual edits - **Mitigation**: Smart merging, comment-based updates, version control - **Monitoring**: Detect and report conflicts 5. **Hook Failures** - **Risk**: Hook errors break Claude Code - **Mitigation**: Always exit 0, error handling, fallback mode - **Monitoring**: Error logs, automatic disable on repeated failures --- ## Documentation Requirements ### User Documentation - [ ] Quick start guide for self-learning features - [ ] Configuration reference - [ ] CLAUDE.md management guide - [ ] Agent performance dashboard - [ ] Troubleshooting guide ### Developer Documentation - [ ] Architecture overview - [ ] Hook system design - [ ] Database schema documentation - [ ] Pattern extraction algorithms - [ ] Contributing guide --- ## Future Enhancements (Post-MVP) ### Phase 5: Advanced Intelligence - Cross-project pattern sharing - Team knowledge graphs - Workflow analytics dashboard - ML-based pattern prediction - Natural language preference input ### Phase 6: Enterprise Features - Multi-user learning - Role-based preferences - Audit trails - Compliance validation - Team best practices --- ## Appendix ### Inspiration & Prior Art 1. **A-MEM (2025)**: Agentic memory with dynamic organization 2. **Zep/Graphiti**: Temporal knowledge graphs for AI agents 3. **ReasoningBank (Google)**: Strategy-level memory framework 4. **MemGPT**: Memory management for LLM agents 5. **Claude Code Hooks**: Native automation support ### Key Research Papers - "A-MEM: Agentic Memory for LLM Agents" (2025) - "Zep: A Temporal Knowledge Graph Architecture" (2025) - "ReasoningBank: Self-Evolving Agents" (Google AI, 2025) --- **Status Legend** - [ ] Not Started - [x] Complete - 🚧 In Progress - ⚠️ Blocked - ❌ Deprecated --- **Last Updated**: 2025-10-12 **Next Review**: Weekly during development **Stakeholders**: User (mattstrautmann), Contributors **Repo**: https://github.com/airmcp-com/mcp-standards ## Implementation Progress (Auto-Updated) ### 2025-10-12: Phase 1 Complete ✓ **Completed:** - [x] Comprehensive PRD document created - [x] Hooks manager with PostToolUse capture - [x] Significance scoring system (multi-factor) - [x] Pattern extraction (corrections, workflows, preferences) - [x] Database schema migration system (7 new tables) - [x] CLAUDE.md manager with smart merging - [x] Pattern frequency tracking with confidence scoring **Files Created:** - `docs/SELF-LEARNING-PRD.md` - This PRD - `hooks/capture_hook.py` - Main hook entry point - `hooks/significance_scorer.py` - Significance calculation - `hooks/pattern_extractor.py` - Pattern detection - `src/mcp_standards/schema_migration.py` - Database migrations - `intelligence/claudemd_manager.py` - CLAUDE.md management **Next Steps:** - [ ] Complete remaining intelligence modules (temporal_graph, validation_engine, agent_tracker) - [ ] Integrate hooks with MCP server - [ ] Add new MCP tools (learn_preference, validate_spec, etc) - [ ] Write comprehensive tests - [ ] Create hooks configuration for Claude Code

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/airmcp-com/mcp-standards'

If you have feedback or need assistance with the MCP directory API, please join our Discord server