Session Buddy

Overview Schema Related Servers Score Discussions

session-buddy
docs
initialization

TAXONOMY_INITIALIZATION.md

TAXONOMY_INITIALIZATION.md•12.2 KiB

# Skills Taxonomy Initialization Guide ## Overview Phase 4 of Session-Buddy introduced three new taxonomy tables for organizing and understanding skills: 1. **skill_categories** - Hierarchical categorization of skills 2. **skill_modalities** - Multi-modal skill type definitions 3. **skill_dependencies** - Co-occurrence relationships between skills The `initialize_taxonomy.py` script populates these tables with predefined seed data. ## Prerequisites Before running the taxonomy initialization: 1. **V4 Migration Must Be Applied** The V4 migration creates the taxonomy tables. Run: ```bash cd /Users/les/Projects/session-buddy # Run migrations (includes V4) python -m session_buddy.storage.migrations migrate ``` 2. **Database Must Exist** The script expects `.session-buddy/skills.db` to exist. ## Usage ### Basic Initialization ```bash cd /Users/les/Projects/session-buddy python scripts/initialize_taxonomy.py ``` ### Expected Output ``` Checking migration status... ✓ V4 migration verified Initializing skills taxonomy... Step 1: Initializing categories... ✓ Category: Code Quality (code) ✓ Category: Testing (testing) ✓ Category: Documentation (documentation) ✓ Category: Build & Deploy (deployment) ✓ Category: Git & Version Control (code) ✓ Category: Linting & Formatting (code) Categories initialized: 6 Step 2: Initializing modalities... ✓ Modality: ruff-check (code: python_source → diagnostics) ✓ Modality: pytest-run (testing: python_tests → test_results) ✓ Modality: sphinx-build (documentation: rst_docs → html_docs) ✓ Modality: docker-build (deployment: dockerfile → docker_image) Modalities initialized: 4 Step 3: Initializing dependencies... ✓ Dependency: ruff-check ↔ black-format (lift: 3.5) ✓ Dependency: pytest-run ↔ coverage-report (lift: 2.8) ✓ Dependency: git-commit ↔ git-push (lift: 4.2) ✓ Dependency: docker-build ↔ k8s-deploy (lift: 2.1) Dependencies initialized: 4 Verifying initialization... ============================================================ TAXONOMY INITIALIZATION SUMMARY ============================================================ Categories: 6 records Modalities: 4 records Dependencies: 4 records ============================================================ Categories initialized: • Build & Deploy [deployment] Build and deployment tools • Code Quality [code] Tools for checking and improving code quality • Documentation [documentation] Documentation generation and checking • Git & Version Control [code] Git and version control operations • Linting & Formatting [code] Code linting and auto-formatting • Testing [testing] Test execution and coverage tools Modalities initialized: • docker-build [deployment] dockerfile → docker_image • pytest-run [testing] python_tests → test_results • ruff-check [code] python_source → diagnostics • sphinx-build [documentation] [requires review] rst_docs → html_docs Dependencies initialized: • git-commit ↔ git-push (lift: 4.2) • ruff-check ↔ black-format (lift: 3.5) • pytest-run ↔ coverage-report (lift: 2.8) • docker-build ↔ k8s-deploy (lift: 2.1) ✓ Taxonomy initialization complete! ``` ## Idempotent Operation The script is **idempotent** - it can be run multiple times safely. It uses `INSERT OR IGNORE` to skip existing records: ```bash # Run once python scripts/initialize_taxonomy.py # Run again - will skip existing records python scripts/initialize_taxonomy.py ``` Second run output: ``` Step 1: Initializing categories... Categories initialized: 0 # All already exist Step 2: Initializing modalities... Modalities initialized: 0 # All already exist Step 3: Initializing dependencies... Dependencies initialized: 0 # All already exist ``` ## Initial Data ### Categories | Category Name | Domain | Description | |--------------|--------|-------------| | Code Quality | code | Tools for checking and improving code quality | | Testing | testing | Test execution and coverage tools | | Documentation | documentation | Documentation generation and checking | | Build & Deploy | deployment | Build and deployment tools | | Git & Version Control | code | Git and version control operations | | Linting & Formatting | code | Code linting and auto-formatting | ### Modalities | Skill Name | Type | Input | Output | Human Review | |-----------|------|-------|--------|--------------| | ruff-check | code | python_source | diagnostics | No | | pytest-run | testing | python_tests | test_results | No | | sphinx-build | documentation | rst_docs | html_docs | Yes | | docker-build | deployment | dockerfile | docker_image | No | ### Dependencies | Skill A | Skill B | Lift Score | Meaning | |---------|---------|-----------|---------| | ruff-check | black-format | 3.5 | Strong co-occurrence | | pytest-run | coverage-report | 2.8 | High co-occurrence | | git-commit | git-push | 4.2 | Very strong co-occurrence | | docker-build | k8s-deploy | 2.1 | Moderate co-occurrence | **Lift Score Interpretation**: - `> 2.0` - Strong positive relationship (used together much more than expected) - `1.5 - 2.0` - Moderate positive relationship - `1.0` - Independent usage - `< 1.0` - Negative relationship (rarely used together) ## Database Schema ### skill_categories Table ```sql CREATE TABLE skill_categories ( category_id INTEGER PRIMARY KEY AUTOINCREMENT, category_name TEXT NOT NULL UNIQUE, parent_category_id INTEGER, description TEXT, domain TEXT, -- 'code', 'documentation', 'testing', 'deployment' created_at TEXT NOT NULL DEFAULT (datetime('now')), FOREIGN KEY (parent_category_id) REFERENCES skill_categories(category_id) ); ``` ### skill_modalities Table ```sql CREATE TABLE skill_modalities ( skill_name TEXT PRIMARY KEY, modality_type TEXT NOT NULL, -- 'code', 'documentation', 'testing', 'deployment' input_format TEXT, -- e.g., 'python_code', 'markdown', 'yaml' output_format TEXT, requires_human_review BOOLEAN DEFAULT 0, created_at TEXT NOT NULL DEFAULT (datetime('now')), FOREIGN KEY (skill_name) REFERENCES skill_invocation(skill_name) ); ``` ### skill_dependencies Table ```sql CREATE TABLE skill_dependencies ( skill_a TEXT NOT NULL, skill_b TEXT NOT NULL, co_occurrence_count INTEGER DEFAULT 1, lift_score REAL, -- >1 means skills used together more than expected last_updated TEXT NOT NULL DEFAULT (datetime('now')), PRIMARY KEY (skill_a, skill_b), FOREIGN KEY (skill_a) REFERENCES skill_invocation(skill_name), FOREIGN KEY (skill_b) REFERENCES skill_invocation(skill_name) ); ``` ## Post-Initialization Queries After initialization, you can query the taxonomy: ### View All Categories ```sql SELECT category_name, domain, description, (SELECT COUNT(*) FROM skill_category_mapping scm WHERE scm.category_id = sc.category_id) as skill_count FROM skill_categories sc ORDER BY domain, category_name; ``` ### View Multi-Modal Skills ```sql SELECT skill_name, modality_type, input_format, output_format, requires_human_review FROM skill_modalities ORDER BY modality_type, skill_name; ``` ### View Skill Dependencies Network ```sql SELECT skill_a, skill_b, co_occurrence_count, lift_score, CASE WHEN lift_score > 2.0 THEN 'strong_positive' WHEN lift_score > 1.5 THEN 'moderate_positive' WHEN lift_score > 1.0 THEN 'weak_positive' WHEN lift_score = 1.0 THEN 'independent' ELSE 'weak_negative' END as relationship_type FROM skill_dependencies WHERE lift_score IS NOT NULL ORDER BY co_occurrence_count DESC; ``` ### Get Skills by Category ```sql SELECT sc.category_name, si.skill_name, COUNT(*) as invocation_count, AVG(CASE WHEN si.completed = 1 THEN 1.0 ELSE 0.0 END) as completion_rate FROM skill_invocation si JOIN skill_category_mapping scm ON si.skill_name = scm.skill_name JOIN skill_categories sc ON scm.category_id = sc.category_id GROUP BY sc.category_name, si.skill_name ORDER BY sc.category_name, invocation_count DESC; ``` ## Extending the Taxonomy ### Adding New Categories Edit `scripts/initialize_taxonomy.py` and add to the `CATEGORIES` list: ```python { "category_name": "Security", "description": "Security scanning and vulnerability detection", "domain": "security", "examples": ["security-scan", "sast-check", "dependency-audit"], }, ``` ### Adding New Modalities Edit the `MODALITIES` list: ```python { "skill_name": "security-scan", "modality_type": "security", "input_format": "codebase", "output_format": "vulnerability_report", "requires_human_review": True, }, ``` ### Adding New Dependencies Edit the `DEPENDENCIES` list: ```python { "skill_a": "security-scan", "skill_b": "sast-check", "expected_lift": 3.2, }, ``` Then re-run the script: ```bash python scripts/initialize_taxonomy.py ``` ## Dynamic Updates The taxonomy can also be updated dynamically via the SkillsStorage API: ### Update Skill Dependencies Automatically ```python from pathlib import Path from session_buddy.storage.skills_storage import SkillsStorage storage = SkillsStorage(Path(".session-buddy/skills.db")) # Update dependencies based on actual co-occurrence data result = storage.update_skill_dependencies(min_co_occurrence=5) print(f"Created {result['dependencies_created']} dependencies") ``` ### Add Category Programmatically ```python import sqlite3 from pathlib import Path db_path = Path(".session-buddy/skills.db") with sqlite3.connect(db_path) as conn: conn.execute( """ INSERT INTO skill_categories (category_name, description, domain, created_at) VALUES (?, ?, ?, datetime('now')) """, ("Security", "Security scanning tools", "security") ) conn.commit() ``` ## Troubleshooting ### Error: "V4 migration not applied" **Cause**: The database schema doesn't include Phase 4 tables. **Solution**: Run migrations first: ```bash python -m session_buddy.storage.migrations migrate ``` ### Error: "Database not found" **Cause**: The `.session-buddy/skills.db` file doesn't exist. **Solution**: Initialize Session-Buddy storage: ```bash python -c "from session_buddy.storage.skills_storage import get_storage; get_storage()" ``` ### No Records Inserted **Cause**: Taxonomy data already exists (script is idempotent). **Solution**: This is expected behavior. To re-initialize, delete existing records first: ```sql DELETE FROM skill_dependencies; DELETE FROM skill_category_mapping; DELETE FROM skill_modalities; DELETE FROM skill_categories; ``` ## Integration with Session-Buddy The taxonomy integrates with Session-Buddy's core features: ### Workflow-Aware Recommendations Categories help organize skills by workflow phase: ```python # Find skills in "Testing" category for execution phase results = storage.search_by_query_workflow_aware( query_embedding, workflow_phase="execution", ) ``` ### Dependency Graph Dependencies enable workflow suggestions: ```python # After running ruff-check, suggest black-format dependencies = storage.get_skill_dependencies("ruff-check") ``` ### Multi-Modal Search Modalities enable type-specific skill discovery: ```python # Find documentation skills cursor.execute( """ SELECT skill_name, input_format, output_format FROM skill_modalities WHERE modality_type = 'documentation' """ ) ``` ## References - **V4 Migration**: `/Users/les/Projects/session-buddy/session_buddy/storage/migrations/V4__phase4_extensions__up.sql` - **SkillsStorage**: `/Users/les/Projects/session-buddy/session_buddy/storage/skills_storage.py` - **Initialization Script**: `/Users/les/Projects/session-buddy/scripts/initialize_taxonomy.py` ## Summary The taxonomy initialization script provides: - ✅ **6 predefined categories** covering common skill domains - ✅ **4 multi-modal type definitions** for different skill types - ✅ **4 dependency relationships** with lift scores - ✅ **Idempotent execution** - safe to run multiple times - ✅ **Extensible design** - easy to add new taxonomy data - ✅ **Integration with SkillsStorage** - works with existing APIs Run the script after V4 migration to seed your skills taxonomy!

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lesleslie/session-buddy'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

TAXONOMY_INITIALIZATION.md•12.2 KiB