Git Commit Video Walkthrough Generator

IMPLEMENTATION_PLAN.md•17.1 KiB

# Implementation Plan: Git Commit Video Walkthrough CLI Application ## Executive Summary This document outlines the plan to build a **CLI application** that generates narrated video walkthroughs of git commits, changes, or entire codebases. The application will use LLM APIs (Anthropic Claude, Google Gemini, etc.) to analyze code and generate natural narration scripts, then compile these into professional video presentations. ## Current State Analysis ### Existing Implementation The current codebase is an MCP server that has been partially implemented but should be converted to a standalone CLI application. ### Target Architecture The application should be a **standalone CLI tool** that: - **Direct LLM integration**: Calls LLM APIs directly (Anthropic, Google AI, etc.) - **Self-contained**: No MCP protocol dependency - **Command-line interface**: Simple `git-commit-video` command - **Local execution**: Runs entirely on the user's machine ## Architecture Design ### Core Pattern: Direct LLM API Integration The CLI application directly calls LLM APIs to generate analysis and scripts: ```typescript import Anthropic from '@anthropic-ai/sdk'; const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, }); // CLI calls LLM for analysis const analysisResult = await anthropic.messages.create({ model: 'claude-3-5-sonnet-20241022', max_tokens: 3000, messages: [{ role: "user", content: "Analyze commit abc123. Read diffs, explain changes..." }], }); // Parse analysis response const analysis = JSON.parse(analysisResult.content[0].text); // CLI calls LLM for narration script const scriptResult = await anthropic.messages.create({ model: 'claude-3-5-sonnet-20241022', max_tokens: 2000, messages: [{ role: "user", content: `Generate narration script based on: ${JSON.stringify(analysis)}` }], }); // CLI generates video using LLM's outputs const video = generateVideo(analysis, scriptResult); ``` ### CLI Interface Design **Command-Line Interface**: ```bash git-commit-video <commit-hash> [options] git-commit-video --staged [options] git-commit-video --unstaged [options] git-commit-video --codebase [options] Options: --repo <path> Path to git repository (default: current directory) --output <path> Output video path (default: ./walkthrough.mp4) --style <style> Presentation style: beginner, technical, overview (default: technical) --theme <theme> Visual theme: dark, light, github (default: dark) --llm <provider> LLM provider: anthropic, google (default: anthropic) --model <name> Specific model to use (default: provider's latest) --help, -h Show help ``` **Examples:** ```bash # Analyze a specific commit git-commit-video abc123 # Analyze staged changes with technical style git-commit-video --staged --style technical # Analyze entire codebase with light theme git-commit-video --codebase --theme light --output ./codebase-overview.mp4 ``` **Processing Stages:** 1. **Initialization** - Validate inputs - Detect git repository - Determine what to analyze based on `target.type` 2. **LLM Analysis Stage** (via API) - **For commits**: Call LLM API to analyze commit diff - **For staged/unstaged**: Call LLM API to analyze `git diff` output - **For codebase**: Call LLM API to analyze entire project structure - LLM returns structured analysis (JSON) 3. **Narration Generation Stage** (via API) - Call LLM API to generate natural narration script - Include style guidance (beginner/technical/overview) - LLM returns timed script with pauses 4. **Video Generation Stage** (native) - Generate HTML frames with syntax-highlighted code - Convert HTML to PNG using Puppeteer - Generate audio from script using TTS - Compile video using FFmpeg 5. **Output** - Return video file path - Include metadata (duration, frame count, audio length) ### State Management **Solution**: Simple in-process state management ```typescript interface WalkthroughState { id: string; // Unique walkthrough ID repoPath: string; target: TargetSpec; style: string; stage: "analysis" | "script" | "video" | "complete"; analysis?: AnalysisResult; // From LLM's first response script?: ScriptResult; // From LLM's second response frames?: string[]; // Generated frame paths audioPath?: string; // Generated audio file } ``` State is maintained in the CLI process from start to finish. Each execution creates a new state object that progresses through the stages. ### LLM Prompt Design **Implementation via Well-Structured Prompts**: ```typescript const analysisPrompt = `Analyze commit ${commitHash}. Your analysis should include: 1. High-level summary (what was achieved, how) 2. File-by-file breakdown with explanations 3. Impact assessment Return JSON format: { "summary": { "achievement": "...", "approach": "..." }, "files": [ {"path": "...", "status": "added|modified|deleted", "explanation": "...", "impact": "..."} ], "totalStats": {"additions": 0, "deletions": 0, "filesChanged": 0} }`; ``` The LLM analyzes the code and returns structured JSON that the CLI parses and uses for video generation. ### Input Type Support **1. Git Commit Analysis** ```typescript target: { type: "commit", commitHash: "abc123" } ``` - Extract commit diff using `simple-git` - Provide diff to agent for analysis - Generate frames showing before/after code **2. Unstaged Changes** ```typescript target: { type: "unstaged" } ``` - Run `git diff` to get working directory changes - Provide diff to agent for analysis - Generate frames showing current changes **3. Staged Changes** ```typescript target: { type: "staged" } ``` - Run `git diff --staged` to get index changes - Provide diff to agent for analysis - Generate frames showing staged modifications **4. Whole Codebase** ```typescript target: { type: "codebase" } ``` - Enumerate project files (respecting .gitignore) - Provide file tree and key files to agent - Generate frames showing codebase structure - Focus on architecture, not detailed diffs ## Testing Strategy ### Test Categories **1. Unit Tests** (`tests/unit/`) - Test git operations (commit analysis, diff extraction) - Test HTML frame generation with syntax highlighting - Test audio generation (mock Edge TTS) - Test video compilation (mock FFmpeg) **2. Integration Tests** (`tests/integration/`) - Test full walkthrough generation end-to-end - Test all four target types (commit, staged, unstaged, codebase) - Test all three styles (beginner, technical, overview) - Test error handling (missing commits, invalid repos) **3. LLM Mock Tests** (`tests/llm/`) - Mock LLM API responses - Verify prompts sent to LLM - Test conversation flow (analysis → script → video) - Validate JSON parsing of LLM responses **4. Visual Regression Tests** (`tests/visual/`) - Capture sample frames for each theme - Verify syntax highlighting works correctly - Test frame layouts (title, file diff, outro) ### Test Fixtures Create fixture repository in `tests/fixtures/sample-repo/`: ``` tests/fixtures/sample-repo/ .git/ # Real git repo src/ index.ts # Sample source file utils.ts # Another file package.json # Project metadata README.md # Documentation ``` With known commits: - `fixture-commit-1`: Add new feature (adds 2 files) - `fixture-commit-2`: Fix bug (modifies 1 file) - `fixture-commit-3`: Refactor (modifies 3 files, deletes 1) ### Test Script ```bash #!/usr/bin/env bash # tests/run-integration-tests.sh set -e echo "Running integration tests..." # Test 1: Analyze commit bun run test:integration -- commit # Test 2: Analyze unstaged changes echo "test change" >> tests/fixtures/sample-repo/src/index.ts bun run test:integration -- unstaged git -C tests/fixtures/sample-repo checkout -- . # Test 3: Analyze staged changes echo "staged change" >> tests/fixtures/sample-repo/src/index.ts git -C tests/fixtures/sample-repo add . bun run test:integration -- staged git -C tests/fixtures/sample-repo reset HEAD # Test 4: Analyze whole codebase bun run test:integration -- codebase echo "✓ All integration tests passed" ``` ### Success Criteria - [ ] All unit tests pass - [ ] All integration tests produce valid MP4 files - [ ] Generated videos have audio that matches frame timing - [ ] Syntax highlighting works for TypeScript, JavaScript, Python, JSON - [ ] All three presentation styles produce different narration - [ ] Subagent delegation prompt is included in sampling requests - [ ] Error handling gracefully handles missing dependencies (FFmpeg, Edge TTS) ## Implementation Phases ### Phase 1: Core CLI Architecture (Week 1) **Goal**: Implement CLI with LLM integration **Tasks**: - [ ] Create CLI entry point (`src/cli.ts`) - [ ] Add command-line argument parsing - [ ] Integrate Anthropic SDK for LLM calls - [ ] Implement commit analysis stage (call LLM API, parse response) - [ ] Implement script generation stage (call LLM API, parse response) - [ ] Add state management for execution flow - [ ] Write unit tests for LLM API mocks **Deliverable**: CLI can call LLM API for analysis and receive structured JSON response ### Phase 2: Input Type Support (Week 2) **Goal**: Support all target types (commit, staged, unstaged, codebase) **Tasks**: - [ ] Implement commit diff extraction - [ ] Implement unstaged changes extraction (`git diff`) - [ ] Implement staged changes extraction (`git diff --staged`) - [ ] Implement codebase file enumeration - [ ] Create target-specific prompts for each type - [ ] Write integration tests for each target type **Deliverable**: CLI works with commits, diffs, and whole codebases ### Phase 3: Video Generation (Week 3) **Goal**: Generate complete videos with frames and audio **Tasks**: - [ ] Refactor frame generation to use LLM's analysis - [ ] Implement syntax highlighting for common languages - [ ] Integrate TTS for audio narration (say.js or edge-tts) - [ ] Implement FFmpeg video compilation - [ ] Add timing synchronization (match audio to frames) - [ ] Test all three themes (dark, light, github) **Deliverable**: CLI produces complete MP4 videos ### Phase 4: Style & Polish (Week 4) **Goal**: Implement presentation styles and improve quality **Tasks**: - [ ] Implement beginner style (detailed, slow-paced) - [ ] Implement technical style (precise, focused) - [ ] Implement overview style (quick, high-level) - [ ] Add natural pauses in narration (silence markers) - [ ] Improve frame layouts and typography - [ ] Add error handling and validation - [ ] Write comprehensive documentation **Deliverable**: Production-ready CLI with excellent UX ### Phase 5: Testing & Documentation (Week 5) **Goal**: Complete test coverage and documentation **Tasks**: - [ ] Create fixture repository with sample commits - [ ] Write full integration test suite - [ ] Add visual regression tests - [ ] Write user guide (README) - [ ] Write developer guide (CONTRIBUTING) - [ ] Create example videos - [ ] Performance testing and optimization **Deliverable**: Well-tested, documented CLI ready for release ## Migration Strategy ### Architecture Change The application is being transformed from an MCP server to a standalone CLI application: **Old**: MCP server with multiple tools ```typescript // MCP server approach const server = new GitCommitVideoServer(); // Client calls MCP tools via protocol ``` **New**: Standalone CLI application ```bash # Direct CLI usage git-commit-video abc123 --style technical ``` ### No Backward Compatibility Needed Since this is a fundamental architectural change from MCP server to CLI application, no backward compatibility is required. This is a complete rewrite. **Recommendation**: Major version bump (v0.1.0 → v1.0.0) ## File Structure Changes ``` src/ cli.ts # Main CLI entry point index.ts # Core application logic stages/ analysis.ts # LLM analysis stage (API calls) script.ts # Script generation stage (API calls) video.ts # Video generation stage (native) generators/ frames.ts # HTML frame generation audio.ts # TTS integration compiler.ts # FFmpeg video compilation extractors/ commit.ts # Extract commit diffs staged.ts # Extract staged changes unstaged.ts # Extract unstaged changes codebase.ts # Enumerate codebase files llm/ anthropic.ts # Anthropic API integration google.ts # Google AI integration types.ts # LLM provider interface utils/ syntax-highlight.ts # Syntax highlighting utilities timing.ts # Audio/frame synchronization prompts.ts # Prompt templates for LLM args.ts # CLI argument parsing types/ state.ts # State management types analysis.ts # Analysis result types script.ts # Script result types tests/ unit/ generators/ # Test frame/audio/video generation extractors/ # Test git operations integration/ walkthrough.test.ts # End-to-end walkthrough tests llm/ mock-api.ts # Mock LLM API responses fixtures/ sample-repo/ # Git repository for testing visual/ snapshots/ # Visual regression snapshots bin/ git-commit-video # Compiled binary (optional) ``` ## Dependencies ### Keep (Already Used) - `simple-git`: Git operations - `puppeteer`: HTML to PNG conversion - `highlight.js`: Syntax highlighting ### Add (New for CLI) - `@anthropic-ai/sdk`: Anthropic Claude API client - `@google/generative-ai`: Google Gemini API client (keep this one!) - `commander` or `yargs`: CLI argument parsing - `chalk`: Colored terminal output - `ora`: Spinner/progress indicators ### Remove (MCP-specific) - `@modelcontextprotocol/sdk`: No longer needed (removing MCP server) - `axios`: Not needed if using native SDK clients ### Keep for TTS - `say`: Cross-platform TTS (or use `edge-tts` Python package) ### External Tools (Required) - **FFmpeg**: Video compilation (must be installed on system) - **Edge TTS** (`edge-tts` Python package): Optional for better TTS quality ## Risk Assessment ### High Risk 1. **API Key Management**: Users need to provide LLM API keys - **Mitigation**: Support environment variables, config files, and command-line args. Provide clear setup documentation 2. **LLM Response Parsing**: LLM might return malformed JSON - **Mitigation**: Use strict JSON schema validation, retry with clarification prompt 3. **External Dependencies**: FFmpeg and TTS must be installed - **Mitigation**: Check for dependencies on startup, provide helpful error messages 4. **API Costs**: LLM API calls incur costs for users - **Mitigation**: Show estimated cost before execution, add `--dry-run` flag to preview prompts ### Medium Risk 1. **Long-running Execution**: Video generation can take minutes - **Mitigation**: Use MCP progress notifications, show incremental updates 2. **Memory Usage**: Large diffs and codebase analysis can consume significant memory - **Mitigation**: Stream processing, limit diff context, paginate file lists ### Low Risk 1. **Theme Compatibility**: Different syntax highlighting themes for different languages - **Mitigation**: Use robust highlight.js with fallback rendering ## Success Metrics ### Functional Requirements ✓ - [ ] Supports commit, staged, unstaged, and codebase analysis - [ ] Uses LLM APIs (Anthropic, Google) for code analysis - [ ] Generates MP4 videos with audio narration - [ ] Implements three presentation styles - [ ] Provides syntax highlighting for 10+ languages - [ ] Works as standalone CLI (no external dependencies except FFmpeg/TTS) ### Quality Requirements ✓ - [ ] 90%+ test coverage - [ ] All integration tests pass - [ ] Videos play correctly on macOS, Linux, Windows - [ ] Audio syncs with frames (±200ms tolerance) - [ ] Natural-sounding narration with appropriate pauses ### Performance Requirements ✓ - [ ] Generates 5-minute video in <2 minutes - [ ] Memory usage <500MB for typical commits - [ ] Handles commits with 100+ file changes ### User Experience Requirements ✓ - [ ] Clear error messages for missing dependencies - [ ] Progress updates during generation - [ ] Comprehensive README with examples - [ ] Example videos demonstrating each style ## Conclusion This implementation plan transforms the project into a **standalone CLI application** that leverages LLM APIs to analyze code and generate professional video walkthroughs. By calling LLM APIs directly (Anthropic Claude, Google Gemini), the application remains simple, self-contained, and easy to use. The phased approach ensures incremental progress with testable milestones, reducing risk and enabling early feedback. By focusing on core CLI architecture and LLM integration first, then expanding to support multiple input types and presentation styles, we build a solid foundation before adding polish. The end result will be a powerful CLI tool that transforms git commits, changes, and codebases into accessible, narrated video walkthroughs - making code review more engaging and understandable for developers of all skill levels.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tomatitito/code-walkthrough-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

IMPLEMENTATION_PLAN.md•17.1 KiB