Session Buddy

Overview Schema Related Servers Score Discussions

SKILL_METRICS_AGGREGATION.md•23.1 KiB

# Cross-Project Skill Metrics Aggregation Design **Status:** Design Document **Author:** Multi-Agent Coordinator **Created:** 2025-02-10 **Related:** Mahavishnu workflow orchestration, Crackerjack skill metrics ## Executive Summary This document defines a privacy-first, cross-project skill metrics aggregation system for Mahavishnu. The design enables workflow intelligence while maintaining strict data locality and privacy guarantees. ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ Mahavishnu Orchestrator │ │ (session-buddy) │ └─────────────────────────────────────────────────────────────────┘ │ ├─── Project Registry │ └── Track all known projects │ ├─── Metrics Collector (NEW) │ ├── Collect from .session-buddy/skill_metrics.json │ ├── Normalize and validate │ └── Store in aggregated DuckDB │ ├─── Aggregation Engine (NEW) │ ├── Cross-project rollups │ ├── Time-series analysis │ └── Pattern detection │ └── Workflow Telemetry (ENHANCED) ├── Skill usage per workflow ├── Effectiveness correlation └── Optimization recommendations ``` ## Design Principles ### 1. Privacy-First Architecture **NO data ever leaves the local environment.** - All aggregation performed locally via DuckDB - No external API calls or telemetry transmission - Project paths are the only identifiers (no PII) - Opt-in per-project (projects explicitly registered) ### 2. Separation of Concerns - **Collection**: Gather metrics from project-specific JSON files - **Aggregation**: Compute cross-project statistics - **Analysis**: Generate insights and recommendations - **Workflow Integration**: Use metrics to optimize orchestration ### 3. Minimal Intrusion - Projects remain autonomous (metrics are local first) - Mahavishnu is optional read-only consumer - No impact on crackerjack's existing workflow - Zero network dependencies ## Data Model ### 1. Project Registry ```python # ~/.claude/data/mahavishnu/projects.duckdb CREATE TABLE projects ( project_id TEXT PRIMARY KEY, # Path hash or user-defined ID project_path TEXT UNIQUE NOT NULL, project_name TEXT, project_type TEXT, # 'web', 'cli', 'library', etc. metadata JSON, # User-defined tags, categories registered_at TIMESTAMP DEFAULT NOW(), last_sync_at TIMESTAMP, is_active BOOLEAN DEFAULT TRUE ); ``` ### 2. Aggregated Metrics ```python # ~/.claude/data/mahavishnu/skill_metrics.duckdb CREATE TABLE skill_usage ( id INTEGER PRIMARY KEY, project_id TEXT REFERENCES projects(project_id), skill_name TEXT NOT NULL, invoked_at TIMESTAMP NOT NULL, completed_at TIMESTAMP, duration_ms REAL, success BOOLEAN, followup_actions INTEGER, session_id TEXT, metadata JSON # Workflow context, agent type, etc. ); CREATE TABLE skill_aggregates ( skill_name TEXT NOT NULL, project_id TEXT, -- NULL for cross-project aggregate time_window TEXT NOT NULL, -- 'daily', 'weekly', 'monthly', 'all' total_invocations INTEGER, successful_completions INTEGER, avg_duration_ms REAL, avg_followup_actions REAL, first_used TIMESTAMP, last_used TIMESTAMP, UNIQUE(skill_name, project_id, time_window) ); CREATE INDEX idx_skill_usage_project_time ON skill_usage(project_id, invoked_at); CREATE INDEX idx_skill_usage_skill_time ON skill_usage(skill_name, invoked_at); ``` ### 3. Workflow Telemetry ```python CREATE TABLE workflow_executions ( workflow_id TEXT PRIMARY KEY, project_id TEXT REFERENCES projects(project_id), workflow_name TEXT NOT NULL, started_at TIMESTAMP NOT NULL, completed_at TIMESTAMP, status TEXT, -- 'running', 'completed', 'failed', 'cancelled' skills_used JSON, -- Array of skill names total_duration_ms REAL, metadata JSON ); CREATE TABLE workflow_skill_correlation ( workflow_name TEXT NOT NULL, skill_name TEXT NOT NULL, usage_count INTEGER, success_rate REAL, avg_duration_ms REAL, UNIQUE(workflow_name, skill_name) ); ``` ## Implementation Components ### Phase 1: Metrics Collection **File:** `session_buddy/mahavishnu/metrics_collector.py` ```python """ Collects skill metrics from registered projects. """ from pathlib import Path from typing import Optional import duckdb import json from datetime import datetime class MetricsCollector: """Collects skill metrics from project-specific JSON files.""" def __init__(self, db_path: Path = Path.home() / ".claude/data/mahavishnu/skill_metrics.duckdb"): self.db_path = db_path self._ensure_db() def collect_from_project(self, project_path: Path) -> dict: """Collect metrics from a single project's skill_metrics.json. Returns: {'collected': int, 'skipped': int, 'errors': int} """ metrics_file = project_path / ".session-buddy/skill_metrics.json" if not metrics_file.exists(): return {'collected': 0, 'skipped': 0, 'errors': 0} with open(metrics_file) as f: data = json.load(f) collected = 0 for skill_name, records in data.get('skills', {}).items(): for record in records: self._insert_record(project_path, skill_name, record) collected += 1 return {'collected': collected, 'skipped': 0, 'errors': 0} def collect_all_projects(self) -> dict[str, dict]: """Collect metrics from all registered projects. Returns: {project_path: {'collected': int, 'skipped': int, 'errors': int}} """ results = {} with duckdb.connect(str(self.db_path)) as conn: projects = conn.execute("SELECT project_path FROM projects WHERE is_active = TRUE").fetchall() for (project_path,) in projects: results[project_path] = self.collect_from_project(Path(project_path)) return results def _insert_record(self, project_path: Path, skill_name: str, record: dict) -> None: """Insert a single skill usage record into the database.""" with duckdb.connect(str(self.db_path)) as conn: conn.execute(""" INSERT INTO skill_usage ( project_id, skill_name, invoked_at, completed_at, duration_ms, success, followup_actions, session_id, metadata ) VALUES ( (SELECT project_id FROM projects WHERE project_path = ?), ?, ?, ?, ?, ?, ?, ?, ? ) """, ( str(project_path), skill_name, record.get('invoked_at'), record.get('completed_at'), record.get('duration_ms'), record.get('success'), record.get('followup_actions', 0), record.get('session_id'), json.dumps(record.get('metadata', {})) )) ``` ### Phase 2: Aggregation Engine **File:** `session_buddy/mahavishnu/aggregation_engine.py` ```python """ Computes cross-project skill metrics aggregations. """ import duckdb from pathlib import Path from typing import Literal TimeWindow = Literal['daily', 'weekly', 'monthly', 'all'] class AggregationEngine: """Computes aggregated skill metrics across projects.""" def __init__(self, db_path: Path = Path.home() / ".claude/data/mahavishnu/skill_metrics.duckdb"): self.db_path = db_path def compute_aggregates( self, project_id: Optional[str] = None, time_window: TimeWindow = 'all' ) -> int: """Compute skill usage aggregates. Args: project_id: Specific project, or None for cross-project time_window: Time aggregation granularity Returns: Number of aggregates computed """ with duckdb.connect(str(self.db_path)) as conn: # Determine time grouping if time_window == 'daily': time_trunc = "DATE_TRUNC('day', invoked_at)" elif time_window == 'weekly': time_trunc = "DATE_TRUNC('week', invoked_at)" elif time_window == 'monthly': time_trunc = "DATE_TRUNC('month', invoked_at)" else: # 'all' time_trunc = "'1970-01-01'::TIMESTAMP" # Compute aggregates result = conn.execute(f""" INSERT OR REPLACE INTO skill_aggregates ( skill_name, project_id, time_window, total_invocations, successful_completions, avg_duration_ms, avg_followup_actions, first_used, last_used ) SELECT skill_name, ?, ?, COUNT(*) as total_invocations, SUM(CAST(success AS INTEGER)) as successful_completions, AVG(duration_ms) as avg_duration_ms, AVG(followup_actions) as avg_followup_actions, MIN(invoked_at) as first_used, MAX(invoked_at) as last_used FROM skill_usage WHERE (? IS NULL OR project_id = ?) GROUP BY skill_name """, (project_id, time_window, project_id, project_id)) return result.rowcount def get_top_skills( self, limit: int = 10, project_id: Optional[str] = None, time_window: TimeWindow = 'all' ) -> list[dict]: """Get most used skills. Returns: List of {'skill_name': str, 'invocations': int, 'success_rate': float} """ with duckdb.connect(str(self.db_path)) as conn: query = """ SELECT skill_name, total_invocations, successful_completions, CAST(successful_completions AS FLOAT) / total_invocations as success_rate FROM skill_aggregates WHERE (? IS NULL OR project_id = ?) AND time_window = ? ORDER BY total_invocations DESC LIMIT ? """ results = conn.execute(query, (project_id, project_id, time_window, limit)).fetchall() return [ { 'skill_name': row[0], 'invocations': row[1], 'success_rate': row[3] } for row in results ] def get_skill_effectiveness( self, skill_name: str, project_id: Optional[str] = None ) -> dict: """Analyze skill effectiveness patterns. Returns: {'avg_duration': float, 'success_rate': float, 'followup_rate': float} """ with duckdb.connect(str(self.db_path)) as conn: row = conn.execute(""" SELECT AVG(duration_ms) as avg_duration, SUM(CAST(success AS INTEGER)) / CAST(COUNT(*) AS FLOAT) as success_rate, AVG(followup_actions) as avg_followup FROM skill_usage WHERE skill_name = ? AND (? IS NULL OR project_id = ?) """, (skill_name, project_id, project_id)).fetchone() return { 'avg_duration_ms': row[0], 'success_rate': row[1], 'avg_followup_actions': row[2] } ``` ### Phase 3: MCP Tools Integration **File:** `session_buddy/mahavishnu/mcp_tools.py` ```python """ MCP tools for cross-project skill metrics. """ from mcp.server.fastmcp import FastMCP from pathlib import Path mcp = FastMCP("mahavishnu-skill-metrics") @mcp.tool() def register_project( project_path: str, project_name: str, project_type: str = "unknown" ) -> str: """Register a project for cross-project metrics collection. Args: project_path: Absolute path to project directory project_name: Human-readable project name project_type: Project category (web, cli, library, etc.) """ # Implementation: Insert into projects table pass @mcp.tool() def collect_skill_metrics(project_path: str | None = None) -> str: """Collect skill metrics from registered projects. Args: project_path: Specific project, or None for all projects Returns: Collection results summary """ collector = MetricsCollector() if project_path: results = collector.collect_from_project(Path(project_path)) else: results = collector.collect_all_projects() return f"Collected metrics from {len(results)} projects" @mcp.tool() def get_top_skills( limit: int = 10, project_id: str | None = None, time_window: str = "all" ) -> str: """Get most used skills across projects. Args: limit: Maximum number of skills to return project_id: Filter by project, or None for all projects time_window: 'daily', 'weekly', 'monthly', or 'all' Returns: JSON array of top skills with usage statistics """ engine = AggregationEngine() skills = engine.get_top_skills( limit=limit, project_id=project_id, time_window=time_window ) return json.dumps(skills, indent=2) @mcp.tool() def analyze_skill_effectiveness( skill_name: str, project_id: str | None = None ) -> str: """Analyze skill effectiveness patterns. Args: skill_name: Name of skill to analyze project_id: Filter by project, or None for cross-project analysis Returns: JSON with effectiveness metrics and insights """ engine = AggregationEngine() effectiveness = engine.get_skill_effectiveness(skill_name, project_id) # Generate insights insights = [] if effectiveness['success_rate'] < 0.7: insights.append(f"Low success rate ({effectiveness['success_rate']:.1%}) - consider reviewing skill implementation") if effectiveness['avg_followup_actions'] > 2: insights.append(f"High follow-up rate ({effectiveness['avg_followup_actions']:.1f}) - skill may need refinement") return json.dumps({ 'metrics': effectiveness, 'insights': insights }, indent=2) @mcp.tool() def get_workflow_recommendations(project_id: str | None = None) -> str: """Generate workflow optimization recommendations. Analyzes skill usage patterns to suggest workflow improvements. Args: project_id: Filter by project, or None for cross-project analysis Returns: JSON with actionable recommendations """ engine = AggregationEngine() # Get top skills top_skills = engine.get_top_skills(limit=20, project_id=project_id) # Analyze patterns recommendations = [] # Check for frequently failing skills for skill in top_skills: if skill['success_rate'] < 0.6: recommendations.append({ 'type': 'skill_improvement', 'priority': 'high', 'skill': skill['skill_name'], 'issue': f"Low success rate ({skill['success_rate']:.1%})", 'action': "Review skill implementation and update prompts" }) # Check for underutilized skills # (implementation specific to your skill catalog) return json.dumps(recommendations, indent=2) ``` ## Integration with Workflow Orchestration ### Workflow Telemetry Enhancement When Mahavishnu orchestrates workflows, it should track: 1. **Which skills are used** in each workflow phase 2. **Workflow outcomes** (success/failure, duration) 3. **Skill-workflow correlation** (which skills work best in which contexts) **Example:** ```python class WorkflowOrchestrator: def execute_workflow(self, workflow_name: str, project_id: str): """Execute workflow with skill telemetry.""" workflow_id = str(uuid.uuid4()) started_at = datetime.now() # Track workflow start self._record_workflow_start(workflow_id, project_id, workflow_name, started_at) skills_used = [] try: # Execute workflow phases for phase in self.workflow_phases(workflow_name): # Track which skill is used skill_name = phase.skill skills_used.append(skill_name) # Execute phase result = self._execute_phase(phase) # Record successful completion self._record_workflow_complete( workflow_id, status='completed', skills_used=skills_used, completed_at=datetime.now() ) except Exception as e: # Record failure self._record_workflow_complete( workflow_id, status='failed', skills_used=skills_used, completed_at=datetime.now(), error=str(e) ) raise ``` ### Recommendation System Use aggregated metrics to optimize workflow orchestration: ```python class WorkflowOptimizer: def recommend_skill_sequence(self, workflow_name: str, project_context: dict) -> list[str]: """Recommend optimal skill sequence based on historical data.""" # Query historical workflow executions with duckdb.connect(self.db_path) as conn: successful_runs = conn.execute(""" SELECT skills_used, total_duration_ms FROM workflow_executions WHERE workflow_name = ? AND status = 'completed' ORDER BY total_duration_ms ASC LIMIT 10 """, (workflow_name,)).fetchall() # Analyze patterns skill_sequences = [run[0] for run in successful_runs] # Return most common successful sequence # (implementation depends on your skill representation) return recommended_sequence ``` ## Privacy Guarantees ### Data Locality ✅ **All data stored locally** in `~/.claude/data/mahavishnu/` ✅ **No external API calls** - aggregation uses DuckDB locally ✅ **No PII** - project paths are only identifiers, can be hashed ### Opt-In Model ✅ **Projects explicitly registered** via `register_project` tool ✅ **Read-only collection** - metrics JSON files never modified ✅ **Unregister anytime** - delete project from registry ### Transparency ✅ **Raw metrics preserved** - always available in project's `.session-buddy/` ✅ **Audit trail** - all collection operations logged ✅ **User controls** - can disable collection, clear aggregated data ## Usage Examples ### Example 1: Cross-Project Skill Analysis ```bash # Register multiple projects /register_project project_path="/Users/les/Projects/crackerjack" project_name="Crackerjack" project_type="cli" /register_project project_path="/Users/les/Projects/session-buddy" project_name="Session Buddy" project_type="mcp-server" # Collect metrics from all projects /collect_skill_metrics # Get top skills across all projects /get_top_skills limit=20 time_window="monthly" # Analyze effectiveness of a specific skill /analyze_skill_effectiveness skill_name="code-review" ``` ### Example 2: Workflow Optimization ```bash # Get recommendations for a specific project /get_workflow_recommendations project_id="crackerjack" # View which skills work best in "feature-dev" workflow # (requires workflow telemetry integration) ``` ### Example 3: Team-Level Insights ```bash # Most used skills across all team projects /get_top_skills limit=50 time_window="all" # Compare skill usage between projects /analyze_skill_effectiveness skill_name="test-generation" project_id="crackerjack" /analyze_skill_effectiveness skill_name="test-generation" project_id="session-buddy" ``` ## Implementation Roadmap ### Phase 1: Foundation (Week 1) - [ ] Create DuckDB schema for skill metrics - [ ] Implement `MetricsCollector` class - [ ] Implement `AggregationEngine` class - [ ] Create project registry table - [ ] Unit tests for collection and aggregation ### Phase 2: MCP Tools (Week 2) - [ ] Implement `register_project` tool - [ ] Implement `collect_skill_metrics` tool - [ ] Implement `get_top_skills` tool - [ ] Implement `analyze_skill_effectiveness` tool - [ ] Integration tests ### Phase 3: Workflow Telemetry (Week 3) - [ ] Add workflow execution tracking - [ ] Implement skill-workflow correlation - [ ] Create `get_workflow_recommendations` tool - [ ] End-to-end workflow tests ### Phase 4: Analysis & Insights (Week 4) - [ ] Implement pattern detection algorithms - [ ] Add time-series trend analysis - [ ] Create visualization data exports - [ ] Documentation and examples ## Key Design Decisions ### Why DuckDB? ✅ **Local-first** - no server required ✅ **Fast analytics** - columnar storage optimized for aggregations ✅ **SQL interface** - familiar query language ✅ **Zero dependencies** - single binary, no external services ✅ **Already in use** - Session Buddy uses DuckDB for reflections ### Why Opt-In Registration? ✅ **Respects project autonomy** - projects are independent ✅ **Privacy control** - user decides what to include ✅ **Scalability** - can aggregate across any number of projects ✅ **Flexibility** - can add/remove projects dynamically ### Why Separate Aggregates Table? ✅ **Performance** - pre-computed aggregations are fast ✅ **Time windows** - supports daily/weekly/monthly views ✅ **Cross-project queries** - unified view without complex joins ✅ **Historical tracking** - preserves trends over time ## Future Enhancements 1. **Skill Relationship Analysis** - Which skills are commonly used together? - Skill co-occurrence patterns - Optimal skill sequences 2. **Predictive Recommendations** - Predict which skills will be needed next - Suggest skill improvements based on failure patterns - Workflow optimization based on historical performance 3. **Team Collaboration** - Share skill metrics across team members (optional) - Aggregate team-level insights - Collaborative skill refinement 4. **Visualization** - Export data for external visualization tools - Built-in charts for skill usage trends - Workflow performance dashboards ## References - **Session Buddy:** `/Users/les/Projects/session-buddy/` - **Crackerjack Skill Metrics:** `crackerjack/skill_metrics.py` (assumed location) - **DuckDB Documentation:** https://duckdb.org/docs/ - **Mahavishnu Architecture:** See Session Buddy README for workflow orchestration details

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lesleslie/session-buddy'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SKILL_METRICS_AGGREGATION.md•23.1 KiB