# Implementation Plan: Git Commit Video Walkthrough CLI Application
## Executive Summary
This document outlines the plan to build a **CLI application** that generates narrated video walkthroughs of git commits, changes, or entire codebases. The application will use LLM APIs (Anthropic Claude, Google Gemini, etc.) to analyze code and generate natural narration scripts, then compile these into professional video presentations.
## Current State Analysis
### Existing Implementation
The current codebase is an MCP server that has been partially implemented but should be converted to a standalone CLI application.
### Target Architecture
The application should be a **standalone CLI tool** that:
- **Direct LLM integration**: Calls LLM APIs directly (Anthropic, Google AI, etc.)
- **Self-contained**: No MCP protocol dependency
- **Command-line interface**: Simple `git-commit-video` command
- **Local execution**: Runs entirely on the user's machine
## Architecture Design
### Core Pattern: Direct LLM API Integration
The CLI application directly calls LLM APIs to generate analysis and scripts:
```typescript
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
// CLI calls LLM for analysis
const analysisResult = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 3000,
messages: [{
role: "user",
content: "Analyze commit abc123. Read diffs, explain changes..."
}],
});
// Parse analysis response
const analysis = JSON.parse(analysisResult.content[0].text);
// CLI calls LLM for narration script
const scriptResult = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 2000,
messages: [{
role: "user",
content: `Generate narration script based on: ${JSON.stringify(analysis)}`
}],
});
// CLI generates video using LLM's outputs
const video = generateVideo(analysis, scriptResult);
```
### CLI Interface Design
**Command-Line Interface**:
```bash
git-commit-video <commit-hash> [options]
git-commit-video --staged [options]
git-commit-video --unstaged [options]
git-commit-video --codebase [options]
Options:
--repo <path> Path to git repository (default: current directory)
--output <path> Output video path (default: ./walkthrough.mp4)
--style <style> Presentation style: beginner, technical, overview (default: technical)
--theme <theme> Visual theme: dark, light, github (default: dark)
--llm <provider> LLM provider: anthropic, google (default: anthropic)
--model <name> Specific model to use (default: provider's latest)
--help, -h Show help
```
**Examples:**
```bash
# Analyze a specific commit
git-commit-video abc123
# Analyze staged changes with technical style
git-commit-video --staged --style technical
# Analyze entire codebase with light theme
git-commit-video --codebase --theme light --output ./codebase-overview.mp4
```
**Processing Stages:**
1. **Initialization**
- Validate inputs
- Detect git repository
- Determine what to analyze based on `target.type`
2. **LLM Analysis Stage** (via API)
- **For commits**: Call LLM API to analyze commit diff
- **For staged/unstaged**: Call LLM API to analyze `git diff` output
- **For codebase**: Call LLM API to analyze entire project structure
- LLM returns structured analysis (JSON)
3. **Narration Generation Stage** (via API)
- Call LLM API to generate natural narration script
- Include style guidance (beginner/technical/overview)
- LLM returns timed script with pauses
4. **Video Generation Stage** (native)
- Generate HTML frames with syntax-highlighted code
- Convert HTML to PNG using Puppeteer
- Generate audio from script using TTS
- Compile video using FFmpeg
5. **Output**
- Return video file path
- Include metadata (duration, frame count, audio length)
### State Management
**Solution**: Simple in-process state management
```typescript
interface WalkthroughState {
id: string; // Unique walkthrough ID
repoPath: string;
target: TargetSpec;
style: string;
stage: "analysis" | "script" | "video" | "complete";
analysis?: AnalysisResult; // From LLM's first response
script?: ScriptResult; // From LLM's second response
frames?: string[]; // Generated frame paths
audioPath?: string; // Generated audio file
}
```
State is maintained in the CLI process from start to finish. Each execution creates a new state object that progresses through the stages.
### LLM Prompt Design
**Implementation via Well-Structured Prompts**:
```typescript
const analysisPrompt = `Analyze commit ${commitHash}.
Your analysis should include:
1. High-level summary (what was achieved, how)
2. File-by-file breakdown with explanations
3. Impact assessment
Return JSON format:
{
"summary": {
"achievement": "...",
"approach": "..."
},
"files": [
{"path": "...", "status": "added|modified|deleted", "explanation": "...", "impact": "..."}
],
"totalStats": {"additions": 0, "deletions": 0, "filesChanged": 0}
}`;
```
The LLM analyzes the code and returns structured JSON that the CLI parses and uses for video generation.
### Input Type Support
**1. Git Commit Analysis**
```typescript
target: { type: "commit", commitHash: "abc123" }
```
- Extract commit diff using `simple-git`
- Provide diff to agent for analysis
- Generate frames showing before/after code
**2. Unstaged Changes**
```typescript
target: { type: "unstaged" }
```
- Run `git diff` to get working directory changes
- Provide diff to agent for analysis
- Generate frames showing current changes
**3. Staged Changes**
```typescript
target: { type: "staged" }
```
- Run `git diff --staged` to get index changes
- Provide diff to agent for analysis
- Generate frames showing staged modifications
**4. Whole Codebase**
```typescript
target: { type: "codebase" }
```
- Enumerate project files (respecting .gitignore)
- Provide file tree and key files to agent
- Generate frames showing codebase structure
- Focus on architecture, not detailed diffs
## Testing Strategy
### Test Categories
**1. Unit Tests** (`tests/unit/`)
- Test git operations (commit analysis, diff extraction)
- Test HTML frame generation with syntax highlighting
- Test audio generation (mock Edge TTS)
- Test video compilation (mock FFmpeg)
**2. Integration Tests** (`tests/integration/`)
- Test full walkthrough generation end-to-end
- Test all four target types (commit, staged, unstaged, codebase)
- Test all three styles (beginner, technical, overview)
- Test error handling (missing commits, invalid repos)
**3. LLM Mock Tests** (`tests/llm/`)
- Mock LLM API responses
- Verify prompts sent to LLM
- Test conversation flow (analysis → script → video)
- Validate JSON parsing of LLM responses
**4. Visual Regression Tests** (`tests/visual/`)
- Capture sample frames for each theme
- Verify syntax highlighting works correctly
- Test frame layouts (title, file diff, outro)
### Test Fixtures
Create fixture repository in `tests/fixtures/sample-repo/`:
```
tests/fixtures/sample-repo/
.git/ # Real git repo
src/
index.ts # Sample source file
utils.ts # Another file
package.json # Project metadata
README.md # Documentation
```
With known commits:
- `fixture-commit-1`: Add new feature (adds 2 files)
- `fixture-commit-2`: Fix bug (modifies 1 file)
- `fixture-commit-3`: Refactor (modifies 3 files, deletes 1)
### Test Script
```bash
#!/usr/bin/env bash
# tests/run-integration-tests.sh
set -e
echo "Running integration tests..."
# Test 1: Analyze commit
bun run test:integration -- commit
# Test 2: Analyze unstaged changes
echo "test change" >> tests/fixtures/sample-repo/src/index.ts
bun run test:integration -- unstaged
git -C tests/fixtures/sample-repo checkout -- .
# Test 3: Analyze staged changes
echo "staged change" >> tests/fixtures/sample-repo/src/index.ts
git -C tests/fixtures/sample-repo add .
bun run test:integration -- staged
git -C tests/fixtures/sample-repo reset HEAD
# Test 4: Analyze whole codebase
bun run test:integration -- codebase
echo "✓ All integration tests passed"
```
### Success Criteria
- [ ] All unit tests pass
- [ ] All integration tests produce valid MP4 files
- [ ] Generated videos have audio that matches frame timing
- [ ] Syntax highlighting works for TypeScript, JavaScript, Python, JSON
- [ ] All three presentation styles produce different narration
- [ ] Subagent delegation prompt is included in sampling requests
- [ ] Error handling gracefully handles missing dependencies (FFmpeg, Edge TTS)
## Implementation Phases
### Phase 1: Core CLI Architecture (Week 1)
**Goal**: Implement CLI with LLM integration
**Tasks**:
- [ ] Create CLI entry point (`src/cli.ts`)
- [ ] Add command-line argument parsing
- [ ] Integrate Anthropic SDK for LLM calls
- [ ] Implement commit analysis stage (call LLM API, parse response)
- [ ] Implement script generation stage (call LLM API, parse response)
- [ ] Add state management for execution flow
- [ ] Write unit tests for LLM API mocks
**Deliverable**: CLI can call LLM API for analysis and receive structured JSON response
### Phase 2: Input Type Support (Week 2)
**Goal**: Support all target types (commit, staged, unstaged, codebase)
**Tasks**:
- [ ] Implement commit diff extraction
- [ ] Implement unstaged changes extraction (`git diff`)
- [ ] Implement staged changes extraction (`git diff --staged`)
- [ ] Implement codebase file enumeration
- [ ] Create target-specific prompts for each type
- [ ] Write integration tests for each target type
**Deliverable**: CLI works with commits, diffs, and whole codebases
### Phase 3: Video Generation (Week 3)
**Goal**: Generate complete videos with frames and audio
**Tasks**:
- [ ] Refactor frame generation to use LLM's analysis
- [ ] Implement syntax highlighting for common languages
- [ ] Integrate TTS for audio narration (say.js or edge-tts)
- [ ] Implement FFmpeg video compilation
- [ ] Add timing synchronization (match audio to frames)
- [ ] Test all three themes (dark, light, github)
**Deliverable**: CLI produces complete MP4 videos
### Phase 4: Style & Polish (Week 4)
**Goal**: Implement presentation styles and improve quality
**Tasks**:
- [ ] Implement beginner style (detailed, slow-paced)
- [ ] Implement technical style (precise, focused)
- [ ] Implement overview style (quick, high-level)
- [ ] Add natural pauses in narration (silence markers)
- [ ] Improve frame layouts and typography
- [ ] Add error handling and validation
- [ ] Write comprehensive documentation
**Deliverable**: Production-ready CLI with excellent UX
### Phase 5: Testing & Documentation (Week 5)
**Goal**: Complete test coverage and documentation
**Tasks**:
- [ ] Create fixture repository with sample commits
- [ ] Write full integration test suite
- [ ] Add visual regression tests
- [ ] Write user guide (README)
- [ ] Write developer guide (CONTRIBUTING)
- [ ] Create example videos
- [ ] Performance testing and optimization
**Deliverable**: Well-tested, documented CLI ready for release
## Migration Strategy
### Architecture Change
The application is being transformed from an MCP server to a standalone CLI application:
**Old**: MCP server with multiple tools
```typescript
// MCP server approach
const server = new GitCommitVideoServer();
// Client calls MCP tools via protocol
```
**New**: Standalone CLI application
```bash
# Direct CLI usage
git-commit-video abc123 --style technical
```
### No Backward Compatibility Needed
Since this is a fundamental architectural change from MCP server to CLI application, no backward compatibility is required. This is a complete rewrite.
**Recommendation**: Major version bump (v0.1.0 → v1.0.0)
## File Structure Changes
```
src/
cli.ts # Main CLI entry point
index.ts # Core application logic
stages/
analysis.ts # LLM analysis stage (API calls)
script.ts # Script generation stage (API calls)
video.ts # Video generation stage (native)
generators/
frames.ts # HTML frame generation
audio.ts # TTS integration
compiler.ts # FFmpeg video compilation
extractors/
commit.ts # Extract commit diffs
staged.ts # Extract staged changes
unstaged.ts # Extract unstaged changes
codebase.ts # Enumerate codebase files
llm/
anthropic.ts # Anthropic API integration
google.ts # Google AI integration
types.ts # LLM provider interface
utils/
syntax-highlight.ts # Syntax highlighting utilities
timing.ts # Audio/frame synchronization
prompts.ts # Prompt templates for LLM
args.ts # CLI argument parsing
types/
state.ts # State management types
analysis.ts # Analysis result types
script.ts # Script result types
tests/
unit/
generators/ # Test frame/audio/video generation
extractors/ # Test git operations
integration/
walkthrough.test.ts # End-to-end walkthrough tests
llm/
mock-api.ts # Mock LLM API responses
fixtures/
sample-repo/ # Git repository for testing
visual/
snapshots/ # Visual regression snapshots
bin/
git-commit-video # Compiled binary (optional)
```
## Dependencies
### Keep (Already Used)
- `simple-git`: Git operations
- `puppeteer`: HTML to PNG conversion
- `highlight.js`: Syntax highlighting
### Add (New for CLI)
- `@anthropic-ai/sdk`: Anthropic Claude API client
- `@google/generative-ai`: Google Gemini API client (keep this one!)
- `commander` or `yargs`: CLI argument parsing
- `chalk`: Colored terminal output
- `ora`: Spinner/progress indicators
### Remove (MCP-specific)
- `@modelcontextprotocol/sdk`: No longer needed (removing MCP server)
- `axios`: Not needed if using native SDK clients
### Keep for TTS
- `say`: Cross-platform TTS (or use `edge-tts` Python package)
### External Tools (Required)
- **FFmpeg**: Video compilation (must be installed on system)
- **Edge TTS** (`edge-tts` Python package): Optional for better TTS quality
## Risk Assessment
### High Risk
1. **API Key Management**: Users need to provide LLM API keys
- **Mitigation**: Support environment variables, config files, and command-line args. Provide clear setup documentation
2. **LLM Response Parsing**: LLM might return malformed JSON
- **Mitigation**: Use strict JSON schema validation, retry with clarification prompt
3. **External Dependencies**: FFmpeg and TTS must be installed
- **Mitigation**: Check for dependencies on startup, provide helpful error messages
4. **API Costs**: LLM API calls incur costs for users
- **Mitigation**: Show estimated cost before execution, add `--dry-run` flag to preview prompts
### Medium Risk
1. **Long-running Execution**: Video generation can take minutes
- **Mitigation**: Use MCP progress notifications, show incremental updates
2. **Memory Usage**: Large diffs and codebase analysis can consume significant memory
- **Mitigation**: Stream processing, limit diff context, paginate file lists
### Low Risk
1. **Theme Compatibility**: Different syntax highlighting themes for different languages
- **Mitigation**: Use robust highlight.js with fallback rendering
## Success Metrics
### Functional Requirements ✓
- [ ] Supports commit, staged, unstaged, and codebase analysis
- [ ] Uses LLM APIs (Anthropic, Google) for code analysis
- [ ] Generates MP4 videos with audio narration
- [ ] Implements three presentation styles
- [ ] Provides syntax highlighting for 10+ languages
- [ ] Works as standalone CLI (no external dependencies except FFmpeg/TTS)
### Quality Requirements ✓
- [ ] 90%+ test coverage
- [ ] All integration tests pass
- [ ] Videos play correctly on macOS, Linux, Windows
- [ ] Audio syncs with frames (±200ms tolerance)
- [ ] Natural-sounding narration with appropriate pauses
### Performance Requirements ✓
- [ ] Generates 5-minute video in <2 minutes
- [ ] Memory usage <500MB for typical commits
- [ ] Handles commits with 100+ file changes
### User Experience Requirements ✓
- [ ] Clear error messages for missing dependencies
- [ ] Progress updates during generation
- [ ] Comprehensive README with examples
- [ ] Example videos demonstrating each style
## Conclusion
This implementation plan transforms the project into a **standalone CLI application** that leverages LLM APIs to analyze code and generate professional video walkthroughs. By calling LLM APIs directly (Anthropic Claude, Google Gemini), the application remains simple, self-contained, and easy to use.
The phased approach ensures incremental progress with testable milestones, reducing risk and enabling early feedback. By focusing on core CLI architecture and LLM integration first, then expanding to support multiple input types and presentation styles, we build a solid foundation before adding polish.
The end result will be a powerful CLI tool that transforms git commits, changes, and codebases into accessible, narrated video walkthroughs - making code review more engaging and understandable for developers of all skill levels.