MCP Standards

Overview Schema Related Servers Score Discussions

mcp-standards
docs
architecture

package-manager-intelligence-system.md•82.8 KiB

# Package Manager Intelligence System **Comprehensive Design Specification** **Date**: 2025-10-20 **Version**: 2.0 --- ## Executive Summary This document specifies a comprehensive package manager detection and preference learning system that reduces user corrections from 3-5 repetitions to 1-2 through intelligent detection, semantic clustering, and Bayesian confidence scoring. **Key Innovations**: - **Intelligent Detection**: Automatic project file analysis (uv.lock, poetry.lock, package.json) - **Semantic Clustering**: AgentDB embeddings cluster similar corrections ("use uv" + "prefer uv" = same pattern) - **Bayesian Confidence**: Success/failure tracking with probabilistic updates - **Cross-Project Learning**: Share preferences across similar project types - **Proactive Application**: Predict and apply before corrections needed **Expected Impact**: - Corrections reduced: 3-5 → 1-2 (60-70% reduction) - Learning speed: 2-3 days → same session - Confidence accuracy: 75% → 90%+ - Context pollution: -2,000 tokens (fewer repeated instructions) --- ## 1. Problem Analysis ### 1.1 Current Pain Point **Real-world scenario** (documented in v2-system-analysis.md): ``` Conversation 1: User: "install pytest" Claude: "I'll use pip install pytest" User: "Actually, use uv not pip" ← Correction #1 Conversation 2 (same session): User: "install requests" Claude: "I'll use pip install requests" User: "Use uv not pip!" ← Correction #2 Conversation 3: User: "install pandas" Claude: "I'll use pip install pandas" User: "USE UV NOT PIP!!!" ← Correction #3 [Pattern detected at 3 occurrences, promoted to preference] Conversation 4: User: "install numpy" Claude: "I'll use uv pip install numpy" ← Finally learned! ``` **Root Causes**: 1. **Keyword-only matching**: "use uv" ≠ "prefer uv" ≠ "always use uv" (treated as different patterns) 2. **No context detection**: Project has `uv.lock` but system ignores it 3. **Fixed threshold**: Requires exactly 3 corrections, no semantic understanding 4. **No cross-project learning**: Learned preference in Project A doesn't apply to Project B 5. **Reactive only**: Never predicts, always waits for user correction ### 1.2 Desired Outcome ``` v2 Workflow with Package Manager Intelligence: Conversation 1 (new project): User: "install pytest" [System detects uv.lock in project root] [System searches AgentDB: "python package installation" → finds 0 patterns] Claude: "I'll use pip install pytest" User: "Actually, use uv not pip" ← Correction #1 [System stores embedding: "prefer uv over pip for python packages"] [System updates confidence: 0.0 → 0.4 (Bayesian prior)] Conversation 2 (same project, 5 min later): User: "install requests" [System detects uv.lock → confidence boost +0.3] [System searches AgentDB: "python package" → finds "prefer uv" (0.7 confidence)] Claude: "I'll use uv pip install requests" ← Learned after 1 correction! [No user correction → success signal → confidence 0.7 → 0.85] Conversation 3 (different project, same machine): [System detects poetry.lock → different manager] User: "install fastapi" [System searches: "python package" → finds "prefer uv" (0.85), "prefer poetry" (0.2)] [System cross-references: poetry.lock detected → suggests poetry] Claude: "I'll use poetry add fastapi" ← Zero corrections, inferred from project context! ``` **Reduction Achieved**: - Corrections: 3 → 1 (67% reduction) - Learning time: Multiple conversations → Same conversation - Cross-project: Manual repetition → Automatic inference --- ## 2. System Architecture ### 2.1 Component Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ Package Manager Intelligence System (PMIS) │ │ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ 1. Project File Detector │ │ │ │ • Scans for uv.lock, poetry.lock, Pipfile, package.json │ │ │ │ • Reads pyproject.toml [tool.*] sections │ │ │ │ • Caches results for 5min (avoid repeated scans) │ │ │ │ • Confidence boost: +0.3 if lock file found │ │ │ └─────────────────┬─────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ 2. Semantic Preference Clustering (AgentDB) │ │ │ │ • Embeds corrections: "use uv" → 384-dim vector │ │ │ │ • Clusters similar: "prefer uv" + "always uv" → same │ │ │ │ • HNSW search: <1ms for "package management" query │ │ │ │ • Threshold: 0.7 cosine similarity = same pattern │ │ │ └─────────────────┬─────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ 3. Bayesian Confidence Scoring │ │ │ │ • Prior: 0.4 (first correction) │ │ │ │ • Success: confidence × 1.2 (capped at 0.95) │ │ │ │ • Failure: confidence × 0.7 (min 0.1) │ │ │ │ • Decay: -0.05/week if unused │ │ │ └─────────────────┬─────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ 4. Cross-Project Learning Engine │ │ │ │ • Detects similar projects via embeddings │ │ │ │ • Shares patterns: 3+ projects → global preference │ │ │ │ • Project-type specific: Django → pytest not unittest │ │ │ └─────────────────┬─────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ 5. Proactive Application Layer │ │ │ │ • Pre-command prediction: "install X" → check patterns │ │ │ │ • Context injection: Add to Claude prompt if >0.7 conf │ │ │ │ • Suggestion mode: "Did you mean 'uv pip install'?" │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ▼ ▼ ┌─────────────────┐ ┌──────────────────────┐ │ AgentDB │ │ SQLite │ │ (Vector Store) │ │ (Audit Trail) │ │ │ │ │ │ • Embeddings │ │ • Full corrections │ │ • HNSW graph │ │ • Timestamps │ │ • <1ms search │ │ • Success/fail log │ │ • Semantic │ │ • Compliance │ └─────────────────┘ └──────────────────────┘ ``` ### 2.2 Data Flow ``` ┌─────────────────────────────────────────────────────────────────┐ │ User Command Flow │ └─────────────────────────────────────────────────────────────────┘ 1. User Input: "install pytest" │ ▼ 2. PMIS Pre-Processing ├─► Project File Detector │ ├─ Scan: uv.lock found ✓ │ ├─ Read: pyproject.toml [tool.uv] ✓ │ └─ Confidence Boost: +0.3 │ ├─► Semantic Search (AgentDB) │ ├─ Query: "python package installation pytest" │ ├─ Embedding: [0.234, -0.567, ...] (384-dim) │ ├─ HNSW Search: <1ms │ └─ Results: [ │ {pattern: "prefer uv over pip", confidence: 0.85, similarity: 0.92}, │ {pattern: "use poetry for deps", confidence: 0.45, similarity: 0.73} │ ] │ ├─► Cross-Project Check │ ├─ Similar projects: 4 found (all use uv) │ ├─ Global preference: "uv for Python projects" (0.9 confidence) │ └─ Context boost: +0.1 │ └─► Final Decision ├─ Pattern: "prefer uv over pip" ├─ Confidence: 0.85 + 0.3 (file) + 0.1 (cross-project) = 1.0 → capped at 0.95 ├─ Threshold: 0.7 → APPLY └─ Inject to prompt: "Use 'uv pip install' for Python packages" 3. Claude Execution ├─ Reads injected context └─ Executes: "uv pip install pytest" 4. Outcome Tracking ├─ Wait 30s for user correction ├─ No correction → SUCCESS ├─ Update confidence: 0.95 × 1.1 = 0.95 (capped) └─ Store in ReasoningBank: { pattern_id: "pkg_mgr_uv_python", outcome: "success", confidence_before: 0.85, confidence_after: 0.95, timestamp: "2025-10-20T22:43:00Z" } 5. Learning Update ├─ AgentDB: Update vector metadata (confidence: 0.95) ├─ SQLite: Append audit log └─ CLAUDE.md: Auto-update if confidence crossed 0.9 threshold ``` --- ## 3. Algorithm Specifications ### 3.1 Project File Detection Algorithm ```python """ Project File Analysis for Package Manager Detection """ from pathlib import Path from typing import Dict, Optional, Tuple from dataclasses import dataclass import tomllib # Python 3.11+ import json @dataclass class PackageManagerSignal: """Detected package manager with confidence score""" manager: str # "uv", "pip", "poetry", "npm", "pnpm", "yarn" confidence: float # 0.0-1.0 evidence: list[str] # Files that support this detection metadata: dict # Additional context (version, config) class ProjectFileDetector: """ Detects preferred package manager from project files. Detection Strategy: 1. Lock files (highest confidence) 2. Config files (medium confidence) 3. Executable scripts (low confidence) Confidence Scoring: - Lock file present: 0.9 - Config section present: 0.7 - Script reference: 0.5 - Multiple signals: max(signals) + 0.1 """ # Detection patterns (priority order) DETECTION_PATTERNS = { # Python ecosystem "uv": { "lock_files": ["uv.lock"], "config_files": ["pyproject.toml"], "config_sections": ["tool.uv"], "confidence_boost": 0.9 }, "poetry": { "lock_files": ["poetry.lock"], "config_files": ["pyproject.toml"], "config_sections": ["tool.poetry"], "confidence_boost": 0.9 }, "pipenv": { "lock_files": ["Pipfile.lock"], "config_files": ["Pipfile"], "config_sections": [], "confidence_boost": 0.9 }, "pip": { "lock_files": ["requirements.txt.lock"], # Rare but exists "config_files": ["requirements.txt", "requirements-dev.txt"], "config_sections": [], "confidence_boost": 0.5 # Lower, as it's the default }, # JavaScript ecosystem "pnpm": { "lock_files": ["pnpm-lock.yaml"], "config_files": ["pnpm-workspace.yaml"], "config_sections": [], "confidence_boost": 0.9 }, "yarn": { "lock_files": ["yarn.lock"], "config_files": [".yarnrc.yml", ".yarnrc"], "config_sections": [], "confidence_boost": 0.9 }, "npm": { "lock_files": ["package-lock.json"], "config_files": ["package.json"], "config_sections": [], "confidence_boost": 0.7 }, } def __init__(self, cache_ttl_seconds: int = 300): """ Initialize detector with caching. Args: cache_ttl_seconds: Cache detection results for this long """ self._cache: Dict[str, Tuple[PackageManagerSignal, float]] = {} self._cache_ttl = cache_ttl_seconds async def detect(self, project_path: Path) -> Optional[PackageManagerSignal]: """ Detect package manager for a project. Args: project_path: Root directory of the project Returns: PackageManagerSignal if detected, None otherwise """ # Check cache cache_key = str(project_path.resolve()) if cache_key in self._cache: signal, cached_at = self._cache[cache_key] if time.time() - cached_at < self._cache_ttl: return signal # Scan for signals detected_signals = [] for manager, patterns in self.DETECTION_PATTERNS.items(): evidence = [] confidence = 0.0 metadata = {} # Check lock files (highest confidence) for lock_file in patterns["lock_files"]: lock_path = project_path / lock_file if lock_path.exists(): evidence.append(f"lock:{lock_file}") confidence = max(confidence, patterns["confidence_boost"]) metadata["lock_file"] = str(lock_path) # Check config files for config_file in patterns["config_files"]: config_path = project_path / config_file if config_path.exists(): evidence.append(f"config:{config_file}") confidence = max(confidence, patterns["confidence_boost"] - 0.2) # Parse config for additional metadata if config_file.endswith(".toml"): metadata.update(self._parse_toml_config( config_path, patterns["config_sections"] )) elif config_file == "package.json": metadata.update(self._parse_package_json(config_path)) # Check config sections (for tools in shared files) if patterns["config_sections"]: for section in patterns["config_sections"]: if section in metadata: confidence = max(confidence, patterns["confidence_boost"]) evidence.append(f"section:{section}") # Multiple signals boost confidence if len(evidence) > 1: confidence = min(1.0, confidence + 0.1) if evidence: detected_signals.append(PackageManagerSignal( manager=manager, confidence=confidence, evidence=evidence, metadata=metadata )) # Return highest confidence signal if detected_signals: best_signal = max(detected_signals, key=lambda s: s.confidence) self._cache[cache_key] = (best_signal, time.time()) return best_signal return None def _parse_toml_config(self, path: Path, sections: list[str]) -> dict: """Parse pyproject.toml and extract relevant sections""" try: with path.open("rb") as f: data = tomllib.load(f) metadata = {} for section in sections: parts = section.split(".") current = data for part in parts: if part in current: current = current[part] metadata[section] = current else: break return metadata except Exception: return {} def _parse_package_json(self, path: Path) -> dict: """Parse package.json for package manager hints""" try: with path.open("r") as f: data = json.load(f) return { "packageManager": data.get("packageManager"), "engines": data.get("engines", {}), "scripts": data.get("scripts", {}) } except Exception: return {} # Example usage: detector = ProjectFileDetector() signal = await detector.detect(Path("/path/to/project")) if signal: print(f"Detected: {signal.manager}") print(f"Confidence: {signal.confidence:.2f}") print(f"Evidence: {signal.evidence}") # Output: # Detected: uv # Confidence: 1.00 # Evidence: ['lock:uv.lock', 'config:pyproject.toml', 'section:tool.uv'] ``` ### 3.2 Semantic Clustering Algorithm ```python """ Semantic Preference Clustering using AgentDB embeddings """ from typing import List, Tuple import numpy as np from dataclasses import dataclass @dataclass class CorrectionPattern: """A user correction pattern with embedding""" id: str text: str # "use uv not pip" embedding: np.ndarray # 384-dim vector category: str # "package-manager" confidence: float occurrence_count: int last_seen: float # timestamp class SemanticClusterer: """ Clusters similar corrections using vector embeddings. Approach: 1. Embed each correction: "use uv not pip" → [0.234, -0.567, ...] 2. Compute pairwise similarities (cosine distance) 3. Cluster if similarity > 0.7 (same pattern) 4. Merge clusters: average embeddings, sum occurrences Benefits: - "use uv" + "prefer uv" + "always use uv" → 1 pattern - Reduces corrections needed: 3 → 1 (same semantic intent) - Cross-language support (if using multilingual embeddings) """ SIMILARITY_THRESHOLD = 0.70 # Cosine similarity threshold CLUSTER_MERGE_THRESHOLD = 0.85 # Very similar → merge def __init__(self, embedding_manager, agentdb_store): """ Initialize clusterer. Args: embedding_manager: EmbeddingManager instance (from embeddings.py) agentdb_store: AgentDB VectorStore instance """ self.embedder = embedding_manager self.agentdb = agentdb_store async def add_correction( self, correction_text: str, category: str = "package-manager" ) -> Tuple[str, bool]: """ Add a correction and cluster with similar patterns. Args: correction_text: The correction (e.g., "use uv not pip") category: Category for filtering Returns: (pattern_id, is_new) - ID of pattern, whether it's new or merged """ # Generate embedding embedding = self.embedder.encode(correction_text, normalize=True) # Search for similar patterns in AgentDB similar_patterns = await self.agentdb.search( query_vector=embedding, k=5, filter={"category": category}, threshold=self.SIMILARITY_THRESHOLD ) if similar_patterns: # Found similar pattern(s) best_match = similar_patterns[0] if best_match["similarity"] >= self.CLUSTER_MERGE_THRESHOLD: # Very similar → merge into existing pattern pattern_id = best_match["id"] await self._merge_into_pattern( pattern_id, correction_text, embedding ) return (pattern_id, False) # Merged into existing else: # Somewhat similar → increment occurrence of closest match pattern_id = best_match["id"] await self._increment_pattern_occurrence(pattern_id) return (pattern_id, False) # No similar patterns → create new pattern_id = await self._create_new_pattern( correction_text, embedding, category ) return (pattern_id, True) async def find_similar( self, query_text: str, category: str = None, min_confidence: float = 0.0, top_k: int = 5 ) -> List[CorrectionPattern]: """ Find patterns similar to a query. Args: query_text: Query string (e.g., "python package installation") category: Optional category filter min_confidence: Minimum confidence threshold top_k: Maximum results Returns: List of similar patterns, sorted by similarity """ # Generate query embedding query_embedding = self.embedder.encode(query_text, normalize=True) # Search AgentDB filters = {} if category: filters["category"] = category if min_confidence > 0: filters["confidence"] = {"$gte": min_confidence} results = await self.agentdb.search( query_vector=query_embedding, k=top_k, filter=filters if filters else None ) # Convert to CorrectionPattern objects patterns = [] for result in results: patterns.append(CorrectionPattern( id=result["id"], text=result["metadata"]["text"], embedding=np.array(result["vector"]), category=result["metadata"]["category"], confidence=result["metadata"]["confidence"], occurrence_count=result["metadata"]["occurrence_count"], last_seen=result["metadata"]["last_seen"] )) return patterns async def _merge_into_pattern( self, pattern_id: str, new_text: str, new_embedding: np.ndarray ): """Merge a new correction into an existing pattern""" # Retrieve existing pattern pattern = await self.agentdb.get(pattern_id) # Update metadata old_count = pattern["metadata"]["occurrence_count"] new_count = old_count + 1 # Average embeddings (simple approach) # More sophisticated: weighted by confidence old_embedding = np.array(pattern["vector"]) merged_embedding = (old_embedding * old_count + new_embedding) / new_count merged_embedding = merged_embedding / np.linalg.norm(merged_embedding) # Normalize # Update in AgentDB await self.agentdb.update( id=pattern_id, vector=merged_embedding.tolist(), metadata={ **pattern["metadata"], "occurrence_count": new_count, "last_seen": time.time(), "variations": pattern["metadata"].get("variations", []) + [new_text] } ) async def _increment_pattern_occurrence(self, pattern_id: str): """Increment occurrence count for a pattern""" pattern = await self.agentdb.get(pattern_id) await self.agentdb.update( id=pattern_id, metadata={ **pattern["metadata"], "occurrence_count": pattern["metadata"]["occurrence_count"] + 1, "last_seen": time.time() } ) async def _create_new_pattern( self, text: str, embedding: np.ndarray, category: str ) -> str: """Create a new pattern""" import hashlib pattern_id = hashlib.sha256(text.encode()).hexdigest()[:16] await self.agentdb.add( id=pattern_id, vector=embedding.tolist(), metadata={ "text": text, "category": category, "confidence": 0.4, # Initial Bayesian prior "occurrence_count": 1, "created_at": time.time(), "last_seen": time.time(), "variations": [text] } ) return pattern_id # Example usage: clusterer = SemanticClusterer(embedding_manager, agentdb_store) # User correction 1 pattern_id_1, is_new = await clusterer.add_correction("use uv not pip") # Output: ("a3f2e1b4", True) - New pattern created # User correction 2 (semantically similar) pattern_id_2, is_new = await clusterer.add_correction("prefer uv over pip") # Output: ("a3f2e1b4", False) - Merged into existing pattern! # User correction 3 (different phrasing) pattern_id_3, is_new = await clusterer.add_correction("always use uv for packages") # Output: ("a3f2e1b4", False) - Same pattern again! # Search for similar patterns patterns = await clusterer.find_similar("python package management") # Output: [CorrectionPattern(text="use uv not pip", confidence=0.7, occurrence_count=3)] ``` ### 3.3 Bayesian Confidence Scoring ```python """ Bayesian Confidence Scoring based on success/failure outcomes """ from typing import Optional from dataclasses import dataclass from enum import Enum class Outcome(Enum): """Possible outcomes of applying a pattern""" SUCCESS = "success" # Applied, no user correction FAILURE = "failure" # Applied, user corrected again PARTIAL = "partial" # Applied, user modified slightly IGNORED = "ignored" # Not applied (low confidence) @dataclass class ConfidenceUpdate: """Result of a confidence update""" old_confidence: float new_confidence: float reason: str outcome: Outcome class BayesianConfidenceScorer: """ Updates pattern confidence based on Bayesian inference. Confidence Interpretation: - 0.0-0.3: Low (don't apply automatically) - 0.3-0.7: Medium (suggest to user) - 0.7-0.9: High (apply automatically) - 0.9-1.0: Very High (apply + promote to CLAUDE.md) Update Rules: - SUCCESS: confidence × 1.2 (capped at 0.95) - FAILURE: confidence × 0.6 (floor at 0.1) - PARTIAL: confidence × 0.95 (slight decrease) - UNUSED: -0.05 per week (decay) Bayesian Reasoning: P(pattern_correct | outcome) = P(outcome | pattern_correct) × P(pattern_correct) / P(outcome) Simplified: - Prior: 0.4 (first correction is informative but not definitive) - Likelihood: success = 0.9, failure = 0.1 - Posterior: updated confidence """ # Confidence thresholds APPLY_THRESHOLD = 0.7 # Auto-apply if confidence >= this SUGGEST_THRESHOLD = 0.5 # Suggest to user if >= this # Update multipliers SUCCESS_BOOST = 1.2 FAILURE_PENALTY = 0.6 PARTIAL_PENALTY = 0.95 DECAY_PER_WEEK = 0.05 # Bounds MIN_CONFIDENCE = 0.1 MAX_CONFIDENCE = 0.95 # Never 1.0 (leave room for doubt) # Bayesian priors INITIAL_PRIOR = 0.4 # First correction LIKELIHOOD_SUCCESS = 0.9 # P(success | pattern_correct) LIKELIHOOD_FAILURE = 0.1 # P(failure | pattern_correct) def __init__(self): """Initialize scorer""" pass def update_confidence( self, current_confidence: float, outcome: Outcome, context: Optional[dict] = None ) -> ConfidenceUpdate: """ Update confidence based on outcome. Args: current_confidence: Current confidence score outcome: Outcome of pattern application context: Additional context (e.g., project type, time since last use) Returns: ConfidenceUpdate with new confidence and reasoning """ old_confidence = current_confidence new_confidence = current_confidence reason = "" if outcome == Outcome.SUCCESS: # Pattern applied successfully, no user correction # Bayesian update: posterior ∝ likelihood × prior new_confidence = current_confidence * self.SUCCESS_BOOST reason = f"Applied successfully, boosting by {self.SUCCESS_BOOST}x" # Extra boost if multiple successes if context and context.get("consecutive_successes", 0) >= 3: new_confidence *= 1.1 reason += " (3+ consecutive successes)" elif outcome == Outcome.FAILURE: # User corrected again - pattern was wrong new_confidence = current_confidence * self.FAILURE_PENALTY reason = f"User corrected, penalizing by {self.FAILURE_PENALTY}x" # Extra penalty if high-confidence failure (worse than low-confidence failure) if current_confidence > 0.8: new_confidence *= 0.9 reason += " (high-confidence failure)" elif outcome == Outcome.PARTIAL: # User modified slightly - pattern was close but not perfect new_confidence = current_confidence * self.PARTIAL_PENALTY reason = f"User modified, slight penalty {self.PARTIAL_PENALTY}x" elif outcome == Outcome.IGNORED: # Pattern not applied (confidence too low) # No change, but track as missed opportunity reason = "Pattern not applied (confidence too low)" # Apply bounds new_confidence = max(self.MIN_CONFIDENCE, min(self.MAX_CONFIDENCE, new_confidence)) # Apply decay if pattern is old if context and "days_since_last_use" in context: days_old = context["days_since_last_use"] weeks_old = days_old / 7.0 decay = self.DECAY_PER_WEEK * weeks_old new_confidence = max(self.MIN_CONFIDENCE, new_confidence - decay) if decay > 0.01: reason += f" (decayed by {decay:.2f} due to {weeks_old:.1f} weeks of non-use)" return ConfidenceUpdate( old_confidence=old_confidence, new_confidence=new_confidence, reason=reason, outcome=outcome ) def should_apply(self, confidence: float) -> bool: """Determine if pattern should be auto-applied""" return confidence >= self.APPLY_THRESHOLD def should_suggest(self, confidence: float) -> bool: """Determine if pattern should be suggested to user""" return confidence >= self.SUGGEST_THRESHOLD def calculate_initial_confidence( self, correction_text: str, project_signals: Optional[list] = None ) -> float: """ Calculate initial confidence for a new pattern. Args: correction_text: The correction text project_signals: List of signals from ProjectFileDetector Returns: Initial confidence (Bayesian prior) """ confidence = self.INITIAL_PRIOR # Boost if project files support this if project_signals: # Example: "use uv" + uv.lock detected → boost confidence for signal in project_signals: if signal.manager.lower() in correction_text.lower(): confidence += signal.confidence * 0.3 # Boost if correction is very specific if len(correction_text.split()) > 5: # Longer corrections are more informative confidence += 0.1 # Cap at initial maximum return min(0.6, confidence) # Example usage: scorer = BayesianConfidenceScorer() # Initial correction: "use uv not pip" initial_confidence = scorer.calculate_initial_confidence( "use uv not pip", project_signals=[PackageManagerSignal(manager="uv", confidence=0.9, ...)] ) # Output: 0.67 (0.4 base + 0.27 from uv.lock detection) # User makes same command → pattern applied successfully update = scorer.update_confidence(initial_confidence, Outcome.SUCCESS) print(f"Confidence: {update.old_confidence:.2f} → {update.new_confidence:.2f}") print(f"Reason: {update.reason}") # Output: # Confidence: 0.67 → 0.80 # Reason: Applied successfully, boosting by 1.2x # Should we auto-apply now? if scorer.should_apply(update.new_confidence): print("Auto-applying pattern from now on") # Output: Auto-applying pattern from now on ``` ### 3.4 Cross-Project Learning ```python """ Cross-Project Learning Engine - Share patterns across similar projects """ from typing import List, Dict, Optional from dataclasses import dataclass from collections import Counter @dataclass class ProjectProfile: """Profile of a project for similarity matching""" path: str project_type: str # "python-django", "python-fastapi", "node-express" package_manager: str embedding: np.ndarray # Embedding of project characteristics patterns: List[str] # Pattern IDs used in this project class CrossProjectLearner: """ Learn patterns across similar projects. Strategy: 1. Embed project characteristics: dependencies, file structure, config 2. Find similar projects via AgentDB vector search 3. If pattern appears in 3+ similar projects → promote to global 4. Apply project-type-specific patterns (e.g., Django → pytest not unittest) Benefits: - New Django project → automatically uses patterns from other Django projects - Reduces "cold start" for new projects - Discovers cross-project conventions (e.g., all FastAPI projects use uvicorn) """ SIMILAR_PROJECT_THRESHOLD = 0.75 # Cosine similarity GLOBAL_PROMOTION_THRESHOLD = 3 # Appear in N projects → global def __init__(self, embedding_manager, agentdb_store, sqlite_db): """ Initialize cross-project learner. Args: embedding_manager: EmbeddingManager instance agentdb_store: AgentDB VectorStore for project embeddings sqlite_db: SQLite connection for pattern tracking """ self.embedder = embedding_manager self.agentdb = agentdb_store self.db = sqlite_db async def create_project_profile( self, project_path: Path ) -> ProjectProfile: """ Create a semantic profile of a project. Args: project_path: Root directory of project Returns: ProjectProfile with embedding """ # Extract project characteristics characteristics = [] # 1. Package manager detector = ProjectFileDetector() pm_signal = await detector.detect(project_path) if pm_signal: characteristics.append(f"package-manager:{pm_signal.manager}") # 2. Project type (detect from dependencies, file structure) project_type = self._detect_project_type(project_path, pm_signal) characteristics.append(f"type:{project_type}") # 3. Dependencies (top 10 most important) dependencies = self._extract_key_dependencies(project_path, pm_signal) characteristics.extend([f"dep:{dep}" for dep in dependencies[:10]]) # 4. File structure (presence of key directories) structure = self._analyze_structure(project_path) characteristics.extend([f"structure:{s}" for s in structure]) # Create embedding from characteristics profile_text = " ".join(characteristics) embedding = self.embedder.encode(profile_text, normalize=True) # Get patterns used in this project patterns = await self._get_project_patterns(str(project_path)) return ProjectProfile( path=str(project_path), project_type=project_type, package_manager=pm_signal.manager if pm_signal else "unknown", embedding=embedding, patterns=patterns ) async def find_similar_projects( self, project_profile: ProjectProfile, top_k: int = 10 ) -> List[ProjectProfile]: """ Find projects similar to the given one. Args: project_profile: Profile of current project top_k: Maximum number of similar projects Returns: List of similar project profiles """ # Search AgentDB for similar project embeddings results = await self.agentdb.search( query_vector=project_profile.embedding, k=top_k, filter={"type": "project_profile"}, threshold=self.SIMILAR_PROJECT_THRESHOLD ) # Convert to ProjectProfile objects similar_projects = [] for result in results: if result["metadata"]["path"] != project_profile.path: # Exclude self similar_projects.append(ProjectProfile( path=result["metadata"]["path"], project_type=result["metadata"]["project_type"], package_manager=result["metadata"]["package_manager"], embedding=np.array(result["vector"]), patterns=result["metadata"]["patterns"] )) return similar_projects async def get_recommended_patterns( self, project_profile: ProjectProfile ) -> List[Dict]: """ Get recommended patterns for a project based on similar projects. Args: project_profile: Profile of current project Returns: List of recommended patterns with confidence scores """ # Find similar projects similar_projects = await self.find_similar_projects(project_profile) if not similar_projects: return [] # Count pattern occurrences across similar projects pattern_counts = Counter() for project in similar_projects: for pattern_id in project.patterns: pattern_counts[pattern_id] += 1 # Calculate confidence based on prevalence total_projects = len(similar_projects) recommendations = [] for pattern_id, count in pattern_counts.most_common(): prevalence = count / total_projects # Only recommend if pattern is common enough if prevalence >= 0.3: # Present in 30%+ of similar projects # Fetch pattern details from AgentDB pattern = await self.agentdb.get(pattern_id) recommendations.append({ "pattern_id": pattern_id, "text": pattern["metadata"]["text"], "confidence": prevalence, # Confidence based on prevalence "occurrence_in_similar": count, "total_similar_projects": total_projects, "reason": f"Used in {count}/{total_projects} similar {project_profile.project_type} projects" }) return recommendations async def check_global_promotion(self, pattern_id: str) -> bool: """ Check if a pattern should be promoted to global (all projects). Args: pattern_id: Pattern to check Returns: True if should be promoted to global CLAUDE.md """ # Query SQLite: how many distinct projects use this pattern? cursor = self.db.execute(""" SELECT COUNT(DISTINCT project_path) as project_count FROM pattern_usage WHERE pattern_id = ? """, (pattern_id,)) result = cursor.fetchone() project_count = result[0] if result else 0 # Promote if used in 3+ projects return project_count >= self.GLOBAL_PROMOTION_THRESHOLD def _detect_project_type( self, project_path: Path, pm_signal: Optional[PackageManagerSignal] ) -> str: """Detect project type from files and dependencies""" # Python projects if pm_signal and pm_signal.manager in ["uv", "pip", "poetry", "pipenv"]: # Check for framework-specific files if (project_path / "manage.py").exists(): return "python-django" elif (project_path / "main.py").exists() or (project_path / "app" / "main.py").exists(): # FastAPI is common with main.py return "python-fastapi" elif (project_path / "setup.py").exists(): return "python-library" else: return "python-application" # JavaScript projects elif pm_signal and pm_signal.manager in ["npm", "yarn", "pnpm"]: package_json = project_path / "package.json" if package_json.exists(): with package_json.open() as f: data = json.load(f) deps = {**data.get("dependencies", {}), **data.get("devDependencies", {})} if "next" in deps: return "node-nextjs" elif "react" in deps: return "node-react" elif "express" in deps: return "node-express" return "node-application" return "unknown" def _extract_key_dependencies( self, project_path: Path, pm_signal: Optional[PackageManagerSignal] ) -> List[str]: """Extract top dependencies from project""" dependencies = [] if pm_signal and pm_signal.manager in ["uv", "poetry"]: # Parse pyproject.toml pyproject = project_path / "pyproject.toml" if pyproject.exists(): with pyproject.open("rb") as f: data = tomllib.load(f) deps = data.get("project", {}).get("dependencies", []) # Extract package names (before version specifiers) dependencies = [dep.split(">=")[0].split("==")[0].strip() for dep in deps] elif pm_signal and pm_signal.manager in ["npm", "yarn", "pnpm"]: # Parse package.json package_json = project_path / "package.json" if package_json.exists(): with package_json.open() as f: data = json.load(f) dependencies = list(data.get("dependencies", {}).keys()) return dependencies def _analyze_structure(self, project_path: Path) -> List[str]: """Analyze project directory structure""" structure_signals = [] # Common directory patterns key_dirs = ["src", "tests", "docs", "app", "lib", "components", "api"] for dir_name in key_dirs: if (project_path / dir_name).is_dir(): structure_signals.append(dir_name) return structure_signals async def _get_project_patterns(self, project_path: str) -> List[str]: """Get all patterns used in a project from SQLite""" cursor = self.db.execute(""" SELECT DISTINCT pattern_id FROM pattern_usage WHERE project_path = ? """, (project_path,)) return [row[0] for row in cursor.fetchall()] # Example usage: learner = CrossProjectLearner(embedding_manager, agentdb_store, sqlite_db) # New Django project project_profile = await learner.create_project_profile(Path("/path/to/new-django-app")) # Output: ProjectProfile(type="python-django", package_manager="uv", ...) # Get recommended patterns from similar Django projects recommendations = await learner.get_recommended_patterns(project_profile) # Output: [ # { # "pattern_id": "test_framework_pytest", # "text": "use pytest not unittest for testing", # "confidence": 0.85, # "occurrence_in_similar": 17, # "total_similar_projects": 20, # "reason": "Used in 17/20 similar python-django projects" # }, # ... # ] # Check if a pattern should be global is_global = await learner.check_global_promotion("pkg_mgr_uv_python") if is_global: print("Promoting to global CLAUDE.md") ``` ### 3.5 Proactive Application ```python """ Proactive Application Layer - Predict and apply patterns before user corrections """ from typing import List, Optional, Dict from dataclasses import dataclass from enum import Enum class ApplicationMode(Enum): """How to apply a pattern""" AUTO_APPLY = "auto_apply" # Silently apply (high confidence) SUGGEST = "suggest" # Ask user first (medium confidence) PASSIVE = "passive" # Don't apply, wait for correction (low confidence) @dataclass class PatternApplication: """Represents applying a pattern to a command""" pattern_id: str pattern_text: str confidence: float mode: ApplicationMode transformation: str # What to change reason: str # Why this pattern applies class ProactiveApplicator: """ Proactively apply patterns before user corrections. Workflow: 1. Intercept command: "install pytest" 2. Search for relevant patterns: "python package installation" 3. Find: "use uv not pip" (confidence 0.85) 4. Transform: "install pytest" → "uv pip install pytest" 5. Inject into Claude's context or auto-execute Benefits: - Zero corrections for high-confidence patterns - Faster workflow (no back-and-forth) - Learning happens in background """ def __init__( self, semantic_clusterer, confidence_scorer, project_file_detector, cross_project_learner ): """ Initialize proactive applicator. Args: semantic_clusterer: SemanticClusterer instance confidence_scorer: BayesianConfidenceScorer instance project_file_detector: ProjectFileDetector instance cross_project_learner: CrossProjectLearner instance """ self.clusterer = semantic_clusterer self.scorer = confidence_scorer self.detector = project_file_detector self.cross_project = cross_project_learner async def analyze_command( self, command: str, project_path: Optional[Path] = None ) -> List[PatternApplication]: """ Analyze a command and find applicable patterns. Args: command: User's command (e.g., "install pytest") project_path: Current project path (for context) Returns: List of applicable patterns with modes """ applications = [] # 1. Detect project context project_signal = None if project_path: project_signal = await self.detector.detect(project_path) # 2. Semantic search for relevant patterns patterns = await self.clusterer.find_similar( query_text=command, min_confidence=0.3, # Low threshold for discovery top_k=10 ) # 3. Boost confidence with project context for pattern in patterns: boosted_confidence = pattern.confidence boost_reasons = [] # Boost if project files support this pattern if project_signal and project_signal.manager in pattern.text.lower(): boosted_confidence += 0.3 boost_reasons.append(f"{project_signal.manager} detected in project") # Boost from cross-project learning if project_path: project_profile = await self.cross_project.create_project_profile(project_path) if pattern.id in project_profile.patterns: boosted_confidence += 0.2 boost_reasons.append("used in this project before") # Determine application mode if boosted_confidence >= self.scorer.APPLY_THRESHOLD: mode = ApplicationMode.AUTO_APPLY elif boosted_confidence >= self.scorer.SUGGEST_THRESHOLD: mode = ApplicationMode.SUGGEST else: mode = ApplicationMode.PASSIVE # Create transformation transformation = self._create_transformation(command, pattern) applications.append(PatternApplication( pattern_id=pattern.id, pattern_text=pattern.text, confidence=boosted_confidence, mode=mode, transformation=transformation, reason=" + ".join([f"Confidence: {pattern.confidence:.2f}"] + boost_reasons) )) # Sort by confidence applications.sort(key=lambda a: a.confidence, reverse=True) return applications def _create_transformation(self, command: str, pattern: CorrectionPattern) -> str: """ Create the transformation to apply pattern to command. Args: command: Original command pattern: Pattern to apply Returns: Transformed command """ # Simple heuristics for package manager transformations # In production, this would use LLM or rule-based system pattern_lower = pattern.text.lower() # "use uv not pip" pattern if "uv" in pattern_lower and "pip" in pattern_lower: if command.startswith("pip install"): return command.replace("pip install", "uv pip install") elif "install" in command and "pip" not in command: return f"uv pip {command}" # "use poetry" pattern elif "poetry" in pattern_lower: if "install" in command: pkg = command.split("install")[-1].strip() return f"poetry add {pkg}" # "use npm" vs "use yarn" vs "use pnpm" elif any(pm in pattern_lower for pm in ["npm", "yarn", "pnpm"]): for pm in ["npm", "yarn", "pnpm"]: if pm in pattern_lower: # Replace package manager in command for old_pm in ["npm", "yarn", "pnpm"]: if old_pm in command and old_pm != pm: return command.replace(old_pm, pm) # No transformation possible return command async def inject_into_context( self, applications: List[PatternApplication] ) -> str: """ Create context injection for Claude's prompt. Args: applications: List of pattern applications Returns: Text to inject into system prompt """ if not applications: return "" # Filter to only auto-apply patterns auto_apply = [a for a in applications if a.mode == ApplicationMode.AUTO_APPLY] if not auto_apply: return "" # Build context injection lines = ["**Learned Preferences (apply automatically)**:"] for app in auto_apply: lines.append(f"- {app.pattern_text} (confidence: {app.confidence:.0%})") if app.transformation: lines.append(f" Example: `{app.transformation}`") return "\n".join(lines) async def suggest_to_user( self, applications: List[PatternApplication] ) -> Optional[str]: """ Create suggestion prompt for user (medium confidence patterns). Args: applications: List of pattern applications Returns: Suggestion text or None """ # Filter to suggest-mode patterns suggest = [a for a in applications if a.mode == ApplicationMode.SUGGEST] if not suggest: return None # Pick top suggestion top = suggest[0] return ( f"I notice you might prefer '{top.transformation}' " f"(based on: {top.reason}). " f"Should I use this from now on?" ) # Example usage: applicator = ProactiveApplicator( semantic_clusterer, confidence_scorer, project_file_detector, cross_project_learner ) # User command: "install pytest" command = "install pytest" project_path = Path("/path/to/project") # Analyze command applications = await applicator.analyze_command(command, project_path) # Check what to do for app in applications: if app.mode == ApplicationMode.AUTO_APPLY: print(f"Auto-applying: {app.transformation}") # Execute: uv pip install pytest elif app.mode == ApplicationMode.SUGGEST: suggestion = await applicator.suggest_to_user([app]) print(f"Suggestion: {suggestion}") # Output: # Auto-applying: uv pip install pytest # (No user correction needed!) ``` --- ## 4. Integration with Existing System ### 4.1 Modified Pattern Extractor ```python """ Enhanced pattern_extractor.py with package manager intelligence """ from pathlib import Path from typing import Optional class EnhancedPatternExtractor: """ Enhanced pattern extractor with package manager intelligence. Integrates: - ProjectFileDetector - SemanticClusterer (AgentDB) - BayesianConfidenceScorer - CrossProjectLearner - ProactiveApplicator """ def __init__(self, db_path: Path, agentdb_path: Path): """Initialize with both SQLite and AgentDB""" # Existing SQLite for audit trail self.sqlite_db = sqlite3.connect(db_path) # New AgentDB for semantic search from src.intelligence.memory.embeddings import EmbeddingManager from src.intelligence.memory.persistence import PersistentMemory self.embedder = EmbeddingManager() self.agentdb = PersistentMemory( db_path=agentdb_path, embedding_model="all-MiniLM-L6-v2" ) # Initialize components self.project_detector = ProjectFileDetector() self.semantic_clusterer = SemanticClusterer(self.embedder, self.agentdb) self.confidence_scorer = BayesianConfidenceScorer() self.cross_project_learner = CrossProjectLearner( self.embedder, self.agentdb, self.sqlite_db ) self.applicator = ProactiveApplicator( self.semantic_clusterer, self.confidence_scorer, self.project_detector, self.cross_project_learner ) async def process_correction( self, correction_text: str, category: str = "package-manager", project_path: Optional[Path] = None ) -> dict: """ Process a user correction with full intelligence stack. Args: correction_text: The correction (e.g., "use uv not pip") category: Category of correction project_path: Current project path Returns: Processing result with confidence and recommendations """ # 1. Detect project context project_signal = None if project_path: project_signal = await self.project_detector.detect(project_path) # 2. Add to semantic cluster pattern_id, is_new = await self.semantic_clusterer.add_correction( correction_text, category ) # 3. Calculate/update confidence if is_new: confidence = self.confidence_scorer.calculate_initial_confidence( correction_text, project_signals=[project_signal] if project_signal else None ) else: # Retrieve existing confidence pattern = await self.agentdb.get(pattern_id) confidence = pattern["metadata"]["confidence"] # 4. Store in SQLite for audit self.sqlite_db.execute(""" INSERT INTO pattern_corrections ( pattern_id, correction_text, category, project_path, confidence, timestamp ) VALUES (?, ?, ?, ?, ?, datetime('now')) """, (pattern_id, correction_text, category, str(project_path) if project_path else None, confidence)) self.sqlite_db.commit() # 5. Check for cross-project patterns if project_path: profile = await self.cross_project_learner.create_project_profile(project_path) similar_projects = await self.cross_project_learner.find_similar_projects(profile) # Check if should promote to global should_promote = await self.cross_project_learner.check_global_promotion(pattern_id) else: similar_projects = [] should_promote = False # 6. Update CLAUDE.md if confidence crossed threshold if confidence >= 0.9 or should_promote: from src.intelligence.claudemd_manager import ClaudeMdManager manager = ClaudeMdManager() await manager.add_preference( category=category, preference=correction_text, confidence=confidence, scope="global" if should_promote else "project" ) return { "pattern_id": pattern_id, "is_new": is_new, "confidence": confidence, "project_signal": project_signal, "similar_projects_count": len(similar_projects), "promoted_to_global": should_promote, "should_auto_apply": self.confidence_scorer.should_apply(confidence) } async def predict_for_command( self, command: str, project_path: Optional[Path] = None ) -> dict: """ Predict which patterns apply to a command before execution. Args: command: User's command project_path: Current project path Returns: Prediction result with transformations """ applications = await self.applicator.analyze_command(command, project_path) # Get top application if applications: top = applications[0] return { "should_transform": top.mode == ApplicationMode.AUTO_APPLY, "transformation": top.transformation if top.mode == ApplicationMode.AUTO_APPLY else None, "suggestion": await self.applicator.suggest_to_user(applications) if top.mode == ApplicationMode.SUGGEST else None, "confidence": top.confidence, "reason": top.reason, "all_patterns": [ { "pattern": a.pattern_text, "confidence": a.confidence, "mode": a.mode.value } for a in applications ] } return { "should_transform": False, "transformation": None, "suggestion": None, "all_patterns": [] } async def record_outcome( self, pattern_id: str, outcome: Outcome, context: Optional[dict] = None ): """ Record the outcome of applying a pattern. Args: pattern_id: Pattern that was applied outcome: Result (success/failure/partial) context: Additional context """ # Retrieve current confidence pattern = await self.agentdb.get(pattern_id) current_confidence = pattern["metadata"]["confidence"] # Update confidence update = self.confidence_scorer.update_confidence( current_confidence, outcome, context ) # Store outcome in SQLite (ReasoningBank) self.sqlite_db.execute(""" INSERT INTO reasoning_episodes ( pattern_id, outcome, confidence_before, confidence_after, reason, timestamp ) VALUES (?, ?, ?, ?, ?, datetime('now')) """, ( pattern_id, outcome.value, update.old_confidence, update.new_confidence, update.reason )) self.sqlite_db.commit() # Update AgentDB with new confidence await self.agentdb.update( id=pattern_id, metadata={ **pattern["metadata"], "confidence": update.new_confidence, "last_outcome": outcome.value, "last_updated": time.time() } ) ``` ### 4.2 Database Schema Extensions ```sql -- Add to existing SQLite schema -- Pattern corrections (audit trail) CREATE TABLE IF NOT EXISTS pattern_corrections ( id INTEGER PRIMARY KEY AUTOINCREMENT, pattern_id TEXT NOT NULL, correction_text TEXT NOT NULL, category TEXT NOT NULL, project_path TEXT, confidence REAL, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_pattern_corrections_pattern ON pattern_corrections(pattern_id); CREATE INDEX idx_pattern_corrections_project ON pattern_corrections(project_path); -- Reasoning episodes (outcome tracking) CREATE TABLE IF NOT EXISTS reasoning_episodes ( id INTEGER PRIMARY KEY AUTOINCREMENT, pattern_id TEXT NOT NULL, outcome TEXT NOT NULL CHECK(outcome IN ('success', 'failure', 'partial', 'ignored')), confidence_before REAL, confidence_after REAL, reason TEXT, context TEXT, -- JSON timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_reasoning_episodes_pattern ON reasoning_episodes(pattern_id); CREATE INDEX idx_reasoning_episodes_outcome ON reasoning_episodes(outcome); -- Pattern usage (cross-project tracking) CREATE TABLE IF NOT EXISTS pattern_usage ( id INTEGER PRIMARY KEY AUTOINCREMENT, pattern_id TEXT NOT NULL, project_path TEXT NOT NULL, usage_count INTEGER DEFAULT 1, first_used TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_used TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE(pattern_id, project_path) ); CREATE INDEX idx_pattern_usage_pattern ON pattern_usage(pattern_id); CREATE INDEX idx_pattern_usage_project ON pattern_usage(project_path); -- Project profiles (for cross-project learning) CREATE TABLE IF NOT EXISTS project_profiles ( id INTEGER PRIMARY KEY AUTOINCREMENT, project_path TEXT NOT NULL UNIQUE, project_type TEXT NOT NULL, package_manager TEXT, dependencies TEXT, -- JSON array created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_project_profiles_type ON project_profiles(project_type); ``` --- ## 5. Test Scenarios ### 5.1 Unit Tests ```python """ Unit tests for package manager intelligence system """ import pytest from pathlib import Path import tempfile import json class TestProjectFileDetector: """Test project file detection""" async def test_detect_uv_from_lock_file(self): """Should detect uv from uv.lock file""" with tempfile.TemporaryDirectory() as tmpdir: project_path = Path(tmpdir) (project_path / "uv.lock").touch() detector = ProjectFileDetector() signal = await detector.detect(project_path) assert signal is not None assert signal.manager == "uv" assert signal.confidence >= 0.9 assert "lock:uv.lock" in signal.evidence async def test_detect_poetry_from_config(self): """Should detect poetry from pyproject.toml""" with tempfile.TemporaryDirectory() as tmpdir: project_path = Path(tmpdir) # Create pyproject.toml with [tool.poetry] pyproject = project_path / "pyproject.toml" pyproject.write_text(""" [tool.poetry] name = "test-project" version = "1.0.0" """) detector = ProjectFileDetector() signal = await detector.detect(project_path) assert signal is not None assert signal.manager == "poetry" assert "tool.poetry" in signal.metadata async def test_cache_detection_results(self): """Should cache detection results for performance""" with tempfile.TemporaryDirectory() as tmpdir: project_path = Path(tmpdir) (project_path / "uv.lock").touch() detector = ProjectFileDetector(cache_ttl_seconds=60) # First call - should scan filesystem signal1 = await detector.detect(project_path) # Delete lock file (project_path / "uv.lock").unlink() # Second call - should return cached result signal2 = await detector.detect(project_path) assert signal1.manager == signal2.manager == "uv" class TestSemanticClusterer: """Test semantic clustering""" async def test_cluster_similar_corrections(self): """Should cluster semantically similar corrections""" clusterer = SemanticClusterer(embedding_manager, agentdb_store) # Add similar corrections id1, is_new1 = await clusterer.add_correction("use uv not pip") id2, is_new2 = await clusterer.add_correction("prefer uv over pip") id3, is_new3 = await clusterer.add_correction("always use uv for packages") # Should merge into same pattern assert id1 == id2 == id3 assert is_new1 == True assert is_new2 == False # Merged assert is_new3 == False # Merged async def test_find_similar_patterns(self): """Should find patterns via semantic search""" clusterer = SemanticClusterer(embedding_manager, agentdb_store) # Add pattern await clusterer.add_correction("use uv not pip", category="package-manager") # Search with different wording patterns = await clusterer.find_similar("python package management tools") assert len(patterns) > 0 assert any("uv" in p.text.lower() for p in patterns) class TestBayesianConfidenceScorer: """Test confidence scoring""" def test_success_boosts_confidence(self): """Successful application should increase confidence""" scorer = BayesianConfidenceScorer() initial = 0.5 update = scorer.update_confidence(initial, Outcome.SUCCESS) assert update.new_confidence > initial assert update.new_confidence <= scorer.MAX_CONFIDENCE def test_failure_reduces_confidence(self): """Failed application should decrease confidence""" scorer = BayesianConfidenceScorer() initial = 0.8 update = scorer.update_confidence(initial, Outcome.FAILURE) assert update.new_confidence < initial assert update.new_confidence >= scorer.MIN_CONFIDENCE def test_confidence_decay_over_time(self): """Unused patterns should decay in confidence""" scorer = BayesianConfidenceScorer() initial = 0.9 context = {"days_since_last_use": 30} # 30 days old update = scorer.update_confidence(initial, Outcome.IGNORED, context) # Should decay (30 days = ~4 weeks = 0.2 decay) assert update.new_confidence < initial class TestCrossProjectLearner: """Test cross-project learning""" async def test_find_similar_projects(self): """Should find projects with similar characteristics""" learner = CrossProjectLearner(embedding_manager, agentdb_store, sqlite_db) # Create Django project profile profile = ProjectProfile( path="/path/to/django-app", project_type="python-django", package_manager="uv", embedding=np.random.rand(384), # Mock patterns=["pkg_mgr_uv", "test_framework_pytest"] ) # Store in AgentDB (mock) # ... store profile ... # Find similar similar = await learner.find_similar_projects(profile) # Should find other Django projects assert all(p.project_type == "python-django" for p in similar) async def test_global_promotion_threshold(self): """Should promote patterns used in 3+ projects""" learner = CrossProjectLearner(embedding_manager, agentdb_store, sqlite_db) # Add pattern usage in 3 projects for i in range(3): sqlite_db.execute(""" INSERT INTO pattern_usage (pattern_id, project_path) VALUES (?, ?) """, ("pkg_mgr_uv", f"/project{i}")) sqlite_db.commit() # Check promotion should_promote = await learner.check_global_promotion("pkg_mgr_uv") assert should_promote == True class TestProactiveApplicator: """Test proactive application""" async def test_auto_apply_high_confidence(self): """High confidence patterns should auto-apply""" applicator = ProactiveApplicator( semantic_clusterer, confidence_scorer, project_detector, cross_project_learner ) # Mock high-confidence pattern in AgentDB # ... command = "install pytest" applications = await applicator.analyze_command(command) assert len(applications) > 0 top = applications[0] if top.confidence >= 0.7: assert top.mode == ApplicationMode.AUTO_APPLY assert "uv" in top.transformation.lower() async def test_suggest_medium_confidence(self): """Medium confidence patterns should suggest""" applicator = ProactiveApplicator(...) # Mock medium-confidence pattern # ... command = "install pytest" applications = await applicator.analyze_command(command) if applications: medium_conf = [a for a in applications if 0.5 <= a.confidence < 0.7] if medium_conf: assert medium_conf[0].mode == ApplicationMode.SUGGEST ``` ### 5.2 Integration Tests ```python """ Integration tests for full workflow """ class TestPackageManagerIntelligence: """End-to-end integration tests""" async def test_first_correction_learning(self): """First correction should establish pattern with boosted confidence""" with tempfile.TemporaryDirectory() as tmpdir: project_path = Path(tmpdir) (project_path / "uv.lock").touch() # Signal: project uses uv extractor = EnhancedPatternExtractor(sqlite_db, agentdb_path) # User corrects: "use uv not pip" result = await extractor.process_correction( "use uv not pip", category="package-manager", project_path=project_path ) # Should detect uv.lock and boost confidence assert result["confidence"] >= 0.7 # Base 0.4 + boost 0.3 assert result["project_signal"].manager == "uv" async def test_second_correction_merges(self): """Second similar correction should merge into same pattern""" extractor = EnhancedPatternExtractor(sqlite_db, agentdb_path) # First correction result1 = await extractor.process_correction("use uv not pip") pattern_id_1 = result1["pattern_id"] # Second correction (different wording) result2 = await extractor.process_correction("prefer uv over pip") pattern_id_2 = result2["pattern_id"] # Should merge into same pattern assert pattern_id_1 == pattern_id_2 assert result2["is_new"] == False async def test_prediction_applies_pattern(self): """After learning, should predict and apply pattern""" with tempfile.TemporaryDirectory() as tmpdir: project_path = Path(tmpdir) (project_path / "uv.lock").touch() extractor = EnhancedPatternExtractor(sqlite_db, agentdb_path) # Learn pattern await extractor.process_correction( "use uv not pip", project_path=project_path ) # Predict for command prediction = await extractor.predict_for_command( "install pytest", project_path ) # Should auto-apply assert prediction["should_transform"] == True assert "uv" in prediction["transformation"].lower() async def test_cross_project_learning(self): """Pattern should transfer to similar projects""" extractor = EnhancedPatternExtractor(sqlite_db, agentdb_path) # Project 1: Learn pattern project1 = Path("/tmp/django-app-1") project1.mkdir(exist_ok=True) (project1 / "manage.py").touch() # Django signal (project1 / "uv.lock").touch() await extractor.process_correction( "use pytest not unittest", category="test-framework", project_path=project1 ) # Project 2: Similar Django project project2 = Path("/tmp/django-app-2") project2.mkdir(exist_ok=True) (project2 / "manage.py").touch() # Django signal # Should recommend pytest (from similar project) prediction = await extractor.predict_for_command( "run tests", project_path=project2 ) # Should suggest pytest assert len(prediction["all_patterns"]) > 0 assert any("pytest" in p["pattern"].lower() for p in prediction["all_patterns"]) async def test_outcome_tracking_adjusts_confidence(self): """Success/failure outcomes should adjust confidence""" extractor = EnhancedPatternExtractor(sqlite_db, agentdb_path) # Learn pattern result = await extractor.process_correction("use uv not pip") pattern_id = result["pattern_id"] initial_confidence = result["confidence"] # Record success await extractor.record_outcome(pattern_id, Outcome.SUCCESS) # Check confidence increased pattern = await extractor.agentdb.get(pattern_id) new_confidence = pattern["metadata"]["confidence"] assert new_confidence > initial_confidence ``` ### 5.3 Performance Tests ```python """ Performance benchmarks """ import time class TestPerformance: """Performance benchmarks""" async def test_project_detection_speed(self): """Project detection should be <10ms""" detector = ProjectFileDetector() with tempfile.TemporaryDirectory() as tmpdir: project_path = Path(tmpdir) (project_path / "uv.lock").touch() start = time.time() signal = await detector.detect(project_path) duration_ms = (time.time() - start) * 1000 assert duration_ms < 10 async def test_semantic_search_speed(self): """Semantic search should be <5ms""" clusterer = SemanticClusterer(embedding_manager, agentdb_store) # Pre-populate with 1000 patterns for i in range(1000): await clusterer.add_correction(f"pattern {i}") # Search start = time.time() patterns = await clusterer.find_similar("test query") duration_ms = (time.time() - start) * 1000 assert duration_ms < 5 async def test_end_to_end_latency(self): """Full workflow should be <50ms""" extractor = EnhancedPatternExtractor(sqlite_db, agentdb_path) start = time.time() prediction = await extractor.predict_for_command("install pytest") duration_ms = (time.time() - start) * 1000 assert duration_ms < 50 ``` --- ## 6. Success Metrics ### 6.1 Quantitative Metrics | Metric | Baseline (Current) | Target (v2) | Measurement Method | |--------|-------------------|-------------|-------------------| | **Corrections to Learn** | 3-5 | 1-2 | Count corrections until pattern confidence >0.7 | | **Learning Time** | 2-3 days | Same session | Time from first to last correction | | **Prediction Accuracy** | N/A (no prediction) | >85% | Correct predictions / total commands | | **False Positive Rate** | N/A | <5% | Incorrect auto-applies / total auto-applies | | **Context Token Reduction** | 0 | -2,000 | Fewer repeated instructions in CLAUDE.md | | **Detection Latency** | N/A | <10ms | Time to detect project package manager | | **Search Latency** | 50ms+ (FTS5) | <5ms | Time to search similar patterns (AgentDB) | ### 6.2 Qualitative Metrics | Aspect | Success Criteria | |--------|-----------------| | **User Experience** | Users report "Claude learned my preference after 1-2 corrections" | | **Transparency** | Users understand why pattern was applied (clear reasoning) | | **Accuracy** | Auto-applied patterns match user's actual preferences >90% of time | | **Adaptability** | System adjusts confidence when user changes preferences | | **Cross-Project** | Patterns learned in one project apply to similar projects | ### 6.3 A/B Test Design ```python """ A/B test to measure impact of package manager intelligence """ # Control Group (current system): # - Keyword-based pattern matching # - Fixed threshold (3 occurrences) # - No project context detection # - No semantic clustering # Treatment Group (new system): # - Semantic pattern clustering # - Bayesian confidence scoring # - Project file detection # - Cross-project learning # - Proactive application # Metrics to track: metrics = { "corrections_to_learn": [], # Per pattern "time_to_learn_hours": [], # Time from first to confident "false_positives": [], # Incorrect auto-applies "user_satisfaction_rating": [], # 1-5 scale "context_tokens_saved": [], # Tokens not sent repeatedly } # Minimum sample size: 50 users per group (100 total) # Test duration: 2 weeks # Success criteria: # - 50%+ reduction in corrections_to_learn # - 80%+ reduction in time_to_learn # - <5% false_positive_rate # - >4.0 user_satisfaction_rating ``` --- ## 7. Implementation Roadmap ### 7.1 Phase 1: Foundation (Week 1) **Goal**: Set up core infrastructure Tasks: 1. Install dependencies ```bash # Add to pyproject.toml dependencies = [ "sentence-transformers>=2.2.0", "faiss-cpu>=1.7.4", "numpy>=1.24.0" ] ``` 2. Implement `ProjectFileDetector` - File: `/src/intelligence/package_mgr/detector.py` - Tests: `/tests/test_detector.py` 3. Extend database schema - Add tables: `pattern_corrections`, `reasoning_episodes`, `pattern_usage`, `project_profiles` - Migration script: `/src/mcp_standards/schema_migration.py` 4. Set up AgentDB integration - Initialize PersistentMemory (already exists in `/src/intelligence/memory/persistence.py`) - Configure for package manager patterns **Deliverables**: - Working project file detection (<10ms) - Database schema extended - 80%+ test coverage ### 7.2 Phase 2: Semantic Clustering (Week 2) **Goal**: Enable semantic pattern matching Tasks: 1. Implement `SemanticClusterer` - File: `/src/intelligence/package_mgr/clusterer.py` - Integrate with existing `EmbeddingManager` 2. Implement `BayesianConfidenceScorer` - File: `/src/intelligence/package_mgr/scorer.py` - Bayesian update logic 3. Update `pattern_extractor.py` - Add semantic clustering calls - Store patterns in AgentDB 4. Integration tests - Test merging of similar corrections - Test confidence updates **Deliverables**: - Semantic clustering working (<5ms search) - Corrections reduced: 3 → 2 (intermediate milestone) - Integration tests passing ### 7.3 Phase 3: Cross-Project & Proactive (Week 3) **Goal**: Enable cross-project learning and proactive application Tasks: 1. Implement `CrossProjectLearner` - File: `/src/intelligence/package_mgr/cross_project.py` - Project profile embeddings 2. Implement `ProactiveApplicator` - File: `/src/intelligence/package_mgr/applicator.py` - Command analysis and transformation 3. Integrate with CLAUDE.md manager - Auto-update when patterns promoted - Event-driven updates 4. End-to-end tests - Full workflow tests - Performance benchmarks **Deliverables**: - Cross-project learning working - Proactive prediction >85% accuracy - Corrections reduced: 3 → 1 (final goal) ### 7.4 Phase 4: Polish & Deploy (Week 4) **Goal**: Production-ready system Tasks: 1. Performance optimization - Cache tuning - Batch operations - Memory profiling 2. Error handling - Graceful degradation (if AgentDB fails → fallback to keyword matching) - User-friendly error messages 3. Documentation - API documentation - User guide - Architecture diagrams 4. A/B test setup - Metrics collection - Control vs treatment groups **Deliverables**: - Production-ready code - <50ms end-to-end latency - Complete documentation - A/B test running --- ## 8. Risk Mitigation ### Risk 1: Embedding Generation Latency **Risk**: Generating embeddings for every correction adds latency **Impact**: Medium (user experience) **Mitigation**: - Use fast local model (all-MiniLM-L6-v2: ~50ms per embedding) - Cache embeddings for common patterns - Batch embed corrections (if multiple in one session) - Fallback to keyword matching if embedding fails **Status**: Low concern (existing EmbeddingManager is fast) ### Risk 2: False Positives **Risk**: Auto-applying wrong pattern frustrates users **Impact**: High (user trust) **Mitigation**: - Conservative confidence threshold (0.7 for auto-apply) - Suggest mode for medium confidence (0.5-0.7) - Track false positives and demote patterns - Allow user to disable auto-apply **Status**: Mitigated through Bayesian scoring ### Risk 3: Storage Bloat **Risk**: AgentDB + SQLite = 2x storage **Impact**: Low (disk space cheap) **Mitigation**: - Prune old low-confidence patterns (monthly) - Compress embeddings (float16 instead of float32) - Limit AgentDB to 100K patterns (sufficient for most users) **Status**: Acceptable tradeoff ### Risk 4: Semantic Clustering Errors **Risk**: Different patterns merged incorrectly (e.g., "use uv" + "use poetry") **Impact**: Medium (learning accuracy) **Mitigation**: - High similarity threshold (0.85 for merging) - Manual review for promoted patterns - User can "unmerge" patterns via tool **Status**: Low concern (threshold tuned conservatively) --- ## 9. Conclusion This comprehensive package manager intelligence system reduces user corrections from 3-5 to 1-2 through: 1. **Intelligent Detection**: Automatic project file analysis (uv.lock, poetry.lock, etc.) provides immediate context 2. **Semantic Clustering**: AgentDB embeddings cluster similar corrections ("use uv" + "prefer uv" = same pattern) 3. **Bayesian Confidence**: Success/failure tracking adjusts confidence probabilistically 4. **Cross-Project Learning**: Patterns transfer across similar projects 5. **Proactive Application**: Predict and apply before corrections needed **Expected Impact**: - 60-70% reduction in repetitive corrections - Learning time: days → same session - Context pollution: -2,000 tokens - User satisfaction: "Finally, it learns!" **Implementation**: 4 weeks, phased rollout, A/B tested This system transforms the frustrating "use uv not pip" loop into a one-time learning experience, delivering on the core promise of mcp-standards: **learn once, apply forever**.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/airmcp-com/mcp-standards'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

package-manager-intelligence-system.md•82.8 KiB