Session Buddy

Overview Schema Related Servers Score Discussions

QUALITY_SCORING_V2.md•8.66 KiB

# Quality Scoring Algorithm V2 ## Problem Statement The current quality scoring system (V1) has critical flaws: 1. **Measures Wrong Things**: 50% of score based on permissions and session tools (not code quality) 1. **Penalizes Modern Tooling**: Missing requirements.txt reduces score even when using UV 1. **Gameable**: Trust 4 operations = instant +20 points regardless of code quality 1. **Binary Indicators**: One missing file = -5.7 points, no gradation 1. **Ignores Actual Quality**: Doesn't check test coverage, lint scores, or code health ### Current Algorithm (Flawed) ```text # V1 Scoring (Total: 100 points) project_health = (indicators / 7) * 40 # 40% - Binary file checks permissions = min(trusted_ops * 5, 20) # 20% - Gameable, not quality session_mgmt = 20 if available else 5 # 20% - Infrastructure, not quality tools = 20 if uv_available else 10 # 20% - Tool presence, not usage # Result: 73/100 for excellent projects missing requirements.txt ``` ## New Algorithm Design (V2) ### Core Principle **Measure what matters: actual code quality, not file existence or tool availability.** ### New Formula (Total: 100 points) ```text # V2 Scoring - Actual Code Quality code_quality = 40% # Test coverage, lint scores, type checking project_health = 30% # Structure, documentation, CI/CD dev_velocity = 20% # Commit patterns, issue tracking, productivity security = 10% # Dependency audits, secret scanning, vulnerabilities # Permissions/tools moved to separate "Trust Score" metric (not in quality) ``` ### Detailed Breakdown #### 1. Code Quality (40 points) - PRIMARY METRIC Integrates with Crackerjack quality metrics: ```text # Sub-components: test_coverage = 15 points # From Crackerjack CODE_COVERAGE lint_score = 10 points # From Crackerjack LINT_SCORE type_coverage = 10 points # From pyright/mypy via Crackerjack complexity = 5 points # From Crackerjack COMPLEXITY (inverse) # Calculation: code_quality_score = ( (test_coverage_pct / 100) * 15 + # 85% coverage = 12.75 pts (lint_score / 100) * 10 + # Crackerjack normalized 0-100 (type_coverage_pct / 100) * 10 + # Type hints coverage (1 - (avg_complexity / 15)) * 5 # Lower complexity = higher score ) ``` #### 2. Project Health (30 points) - STRUCTURAL QUALITY Modernized indicators with smart detection: ```text # Modern tooling (15 points): package_management = 5 pts # pyproject.toml + (uv.lock OR requirements.txt) version_control = 5 pts # .git + meaningful commit history dependency_mgmt = 5 pts # Lockfile present + recent updates # Project maturity (15 points): testing_infra = 5 pts # tests/ + conftest.py + >10 tests documentation = 5 pts # README + docs/ OR inline docstrings >50% ci_cd_pipeline = 5 pts # .github/workflows + passing status # Calculation: project_health_score = ( _calculate_tooling_score() + # 0-15 points _calculate_maturity_score() # 0-15 points ) ``` #### 3. Development Velocity (20 points) - PRODUCTIVITY METRICS Measures development health: ```text # Git activity (10 points): commit_frequency = 5 pts # Regular commits (not too sparse, not spam) commit_quality = 5 pts # Conventional commits, meaningful messages # Development patterns (10 points): issue_tracking = 5 pts # Issues referenced in commits branch_strategy = 5 pts # Feature branches, PR workflow # Calculation: velocity_score = ( _analyze_git_activity() + # 0-10 points _analyze_dev_patterns() # 0-10 points ) ``` #### 4. Security (10 points) - SAFETY METRICS Critical security indicators: ```text # Security tooling (5 points): dependency_scan = 3 pts # Bandit/safety checks via Crackerjack secret_detection = 2 pts # No hardcoded secrets/keys # Security hygiene (5 points): vuln_free_deps = 3 pts # No critical CVEs in dependencies secure_patterns = 2 pts # No eval(), exec(), SQL injection patterns # Calculation: security_score = ( _run_security_checks() + # 0-5 points _check_security_hygiene() # 0-5 points ) ``` ### Separate: Trust Score (Not in Quality) Permissions and session management moved to separate metric: ```text # Trust Score (separate from quality, 0-100) trusted_operations = 40 pts # Number of trusted operations session_availability = 30 pts # Session management features tool_ecosystem = 30 pts # Available MCP tools and integrations # This measures "how much the system trusts your environment" # NOT "how good your code is" ``` ## Implementation Strategy ### Phase 1: Crackerjack Integration 1. Use existing `CrackerjackIntegration.get_quality_metrics_history()` 1. Extract: test_pass_rate, code_coverage, lint_score, security_score, complexity_score 1. Cache results for 5 minutes to avoid re-running Crackerjack on every checkpoint ### Phase 2: Git Analysis 1. Parse git log for commit patterns 1. Analyze commit messages for conventional commits format 1. Check for branch strategy (feature branches vs main-only) ### Phase 3: Project Structure Analysis 1. Smart detection: UV OR requirements.txt (not penalizing modern tools) 1. Documentation quality: count docstrings, not just README existence 1. CI/CD: check workflow status via GitHub API or file timestamps ### Phase 4: Security Integration 1. Run bandit/safety via Crackerjack if not cached 1. Check for .env in .gitignore 1. Scan for hardcoded secrets patterns ## Scoring Examples ### Example 1: Modern Python Project (This Repository) ```text # Code Quality (40 points) test_coverage: 31.6% → 4.7 points # Low coverage hurts lint_score: 100 → 10 points # Clean code type_coverage: 90% → 9 points # Good type hints complexity: avg 8 → 3.7 points # Low complexity # Project Health (30 points) tooling: 15 points # pyproject.toml + uv.lock + git maturity: 13 points # tests/ + docs/ + README (no CI/CD -2) # Development Velocity (20 points) git_activity: 9 points # Regular commits, good messages dev_patterns: 7 points # Feature branches, some issue refs # Security (10 points) security_tools: 5 points # Crackerjack security checks hygiene: 4 points # Clean patterns # TOTAL: 4.7 + 10 + 9 + 3.7 + 15 + 13 + 9 + 7 + 5 + 4 = 81/100 # (Up from 73, now reflects actual quality with test coverage penalty) ``` ### Example 2: Excellent Project ```text # Code Quality: 38/40 (95% coverage, perfect lint, 100% types, low complexity) # Project Health: 28/30 (full tooling, CI/CD, excellent docs) # Dev Velocity: 18/20 (active development, good patterns) # Security: 10/10 (all checks pass) # TOTAL: 94/100 (Achievable with actual quality, not by gaming permissions) ``` ### Example 3: Quick Script (Low Quality Expected) ```text # Code Quality: 8/40 (no tests, some lint issues) # Project Health: 8/30 (just .py files, no structure) # Dev Velocity: 5/20 (few commits) # Security: 6/10 (some basic checks) # TOTAL: 27/100 (Honest assessment of code quality) ``` ## Migration Plan 1. **Implement new algorithm** as `calculate_quality_score_v2()` 1. **Run both algorithms** in parallel for comparison 1. **Log differences** to analyze scoring changes 1. **Switch default** after validation period 1. **Deprecate V1** after 1 month ## Benefits ✅ **Measures Real Quality**: Test coverage, lint scores, actual code metrics ✅ **Modern Tooling**: UV-first, not penalized for skipping requirements.txt ✅ **Not Gameable**: Can't boost score by just trusting operations ✅ **Actionable**: Score directly correlates to code improvements ✅ **Integrated**: Leverages existing Crackerjack infrastructure ✅ **Honest**: Low score means low quality, high score earned through quality ## API Changes ```text # Old API (deprecated) quality_data = await calculate_quality_score() # Returns: {"total_score": 73, "breakdown": {...}} # New API quality_data = await calculate_quality_score_v2() # Returns: { "total_score": 81, "version": "2.0", "breakdown": { "code_quality": {"score": 27.4, "max": 40, "details": {...}}, "project_health": {"score": 28, "max": 30, "details": {...}}, "dev_velocity": {"score": 16, "max": 20, "details": {...}}, "security": {"score": 9, "max": 10, "details": {...}}, }, "trust_score": { "score": 85, "breakdown": { "trusted_operations": 32, "session_availability": 28, "tool_ecosystem": 25, }, }, "recommendations": [...], } ``` ## Next Steps 1. Implement `calculate_quality_score_v2()` in `session_buddy/utils/quality_utils_v2.py` 1. Add Crackerjack integration with caching 1. Implement git analysis utilities 1. Create comprehensive test suite 1. Update MCP tools to use V2

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lesleslie/session-buddy'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

QUALITY_SCORING_V2.md•8.66 KiB