loom-integration-plan.md•9.58 kB
# Loom Video Analysis Integration Plan
## Overview
Enhance the Shortcut MCP to extract actionable debugging information from Loom videos attached to tickets. This includes transcription, visual analysis, account detection, and feature usage tracking.
## Goals
1. Extract voice transcriptions from Loom videos
2. Analyze video content for account information and UI interactions
3. Identify features being demonstrated or reported as broken
4. Generate structured summaries for LLM consumption
5. Extract key frames at critical moments
## Architecture Components
### 1. Loom Link Detection
- **Pattern Matching**: Detect Loom URLs in story descriptions and comments
- Format: `https://www.loom.com/share/{video_id}`
- Format: `https://loom.com/share/{video_id}`
- **Attachment Detection**: Check linked_files for Loom embeds
- **Extract Video IDs**: Parse IDs from various Loom URL formats
### 2. Loom API Integration
#### 2.1 Authentication
- **API Key Management**: Store Loom API key in environment
- **OAuth Support**: Optional OAuth flow for user-specific access
- **Rate Limiting**: Handle Loom API rate limits gracefully
#### 2.2 Video Metadata Retrieval
```typescript
interface LoomVideoMetadata {
id: string
title: string
description: string
duration: number
created_at: string
owner: {
id: string
name: string
email: string
}
thumbnail_url: string
download_url?: string
transcript?: LoomTranscript
}
```
### 3. Transcription Extraction
#### 3.1 Native Loom Transcripts
- **API Endpoint**: GET `/videos/{video_id}/transcript`
- **Fallback**: Use Loom's auto-generated captions if available
- **Format**: Time-coded transcript with speaker identification
#### 3.2 Audio Extraction (if needed)
- **Download Audio**: Extract audio track from video
- **External Transcription**: Use Whisper API or similar for high-quality transcription
- **Speaker Diarization**: Identify different speakers (reporter vs. support agent)
### 4. Visual Analysis
#### 4.1 Key Frame Extraction
- **Scene Detection**: Identify when screen content changes significantly
- **Error Detection**: Look for error messages, red text, modal dialogs
- **UI State Changes**: Detect navigation between screens
- **Mouse/Cursor Tracking**: Follow user interactions
#### 4.2 OCR Processing
- **Text Extraction**: Extract all visible text from key frames
- **Account Information Detection**:
- Email addresses
- User IDs
- Organization names
- Subscription tiers
- Feature flags
- **Error Message Extraction**: Capture exact error text
- **URL Detection**: Extract visible URLs and endpoints
#### 4.3 UI Element Recognition
- **Button Clicks**: Identify which buttons/links are clicked
- **Form Inputs**: Detect what data is entered
- **Feature Identification**: Map UI elements to feature names
- **Navigation Flow**: Track user journey through the application
### 5. Content Analysis Engine
#### 5.1 Account Information Extraction
```typescript
interface AccountInfo {
emails: string[]
userIds: string[]
organizationNames: string[]
subscriptionType?: string
customDomains: string[]
apiKeys?: string[] // Redacted
timestamps: Map<string, number> // When each piece of info appears
}
```
#### 5.2 Feature Usage Detection
```typescript
interface FeatureUsage {
featureName: string
actions: Array<{
timestamp: number
actionType: 'click' | 'input' | 'navigation' | 'error'
elementText?: string
inputValue?: string
outcome: 'success' | 'error' | 'unknown'
}>
errorMessages: string[]
successIndicators: string[]
}
```
#### 5.3 Issue Classification
- **Bug Report**: Error messages, unexpected behavior
- **Feature Request**: Attempted actions that don't exist
- **Confusion**: Repeated attempts, backtracking
- **Performance Issue**: Loading indicators, timeouts
### 6. Structured Output Generation
#### 6.1 Video Summary Format
```typescript
interface LoomVideoSummary {
video_id: string
duration: number
// Transcription
transcript: {
full_text: string
key_statements: string[]
questions_asked: string[]
error_descriptions: string[]
}
// Visual Analysis
account_info: AccountInfo
features_used: FeatureUsage[]
// Key Moments
key_moments: Array<{
timestamp: number
type: 'error' | 'confusion' | 'success' | 'question'
description: string
screenshot_url?: string
transcript_excerpt: string
}>
// Reproduction Steps
reproduction_steps: string[]
// Environment
environment: {
browser?: string
os?: string
screen_resolution?: string
visible_urls: string[]
}
// AI Analysis
summary: string
suggested_actions: string[]
severity_assessment: 'critical' | 'high' | 'medium' | 'low'
}
```
### 7. Implementation Phases
#### Phase 1: Basic Loom Integration (Week 1)
- [ ] Loom URL detection in stories
- [ ] Loom API client implementation
- [ ] Basic metadata retrieval
- [ ] Native transcript extraction
- [ ] Simple text-based summary
#### Phase 2: Enhanced Transcription (Week 2)
- [ ] Audio extraction capability
- [ ] Whisper API integration
- [ ] Speaker diarization
- [ ] Key quote extraction
- [ ] Question detection
#### Phase 3: Visual Analysis - Basic (Week 3)
- [ ] Key frame extraction at regular intervals
- [ ] Basic OCR using Tesseract or cloud service
- [ ] Email and ID detection
- [ ] Error message extraction
#### Phase 4: Visual Analysis - Advanced (Week 4)
- [ ] Scene change detection
- [ ] Mouse tracking and click detection
- [ ] UI element recognition
- [ ] Feature usage tracking
- [ ] Navigation flow analysis
#### Phase 5: Intelligence Layer (Week 5)
- [ ] Account information aggregation
- [ ] Automatic reproduction step generation
- [ ] Issue classification
- [ ] Severity assessment
- [ ] Related ticket suggestions
#### Phase 6: Optimization & Caching (Week 6)
- [ ] Video analysis caching
- [ ] Incremental processing
- [ ] Batch processing for multiple videos
- [ ] Performance optimization
### 8. MCP Tool Interfaces
#### 8.1 Get Loom Videos from Story
```typescript
shortcut_get_story_loom_videos: {
story_id: number
} => {
videos: Array<{
url: string
video_id: string
location: 'description' | 'comment' | 'linked_file'
location_id?: number
}>
}
```
#### 8.2 Analyze Loom Video
```typescript
shortcut_analyze_loom_video: {
video_url: string
analysis_depth?: 'basic' | 'full' | 'visual_only' | 'transcript_only'
extract_frames?: boolean
} => LoomVideoSummary
```
#### 8.3 Extract Account Info from Videos
```typescript
shortcut_extract_video_accounts: {
story_id: number
} => {
accounts: AccountInfo[]
confidence_scores: Map<string, number>
}
```
### 9. Technical Considerations
#### 9.1 Performance
- **Async Processing**: Long-running video analysis in background
- **Webhooks**: Optional webhook for completion notification
- **Streaming**: Stream results as they become available
- **Caching**: Cache analysis results with TTL
#### 9.2 Security
- **PII Handling**: Redact sensitive information
- **API Key Detection**: Scan for and redact exposed credentials
- **Access Control**: Respect Loom video permissions
- **Data Retention**: Clear cached video data after analysis
#### 9.3 Error Handling
- **Graceful Degradation**: Partial results on failure
- **Retry Logic**: Handle transient failures
- **Fallback Options**: Multiple transcription services
- **User Feedback**: Clear error messages
### 10. Third-Party Services
#### 10.1 Required APIs
- **Loom API**: Video metadata and transcripts
- **OpenAI Whisper**: High-quality transcription
- **Google Cloud Vision**: OCR and image analysis
- **AWS Rekognition**: Alternative for visual analysis
#### 10.2 Optional Enhancements
- **Assembly AI**: Advanced transcription with topics
- **Rev AI**: Human-quality transcription
- **Twelve Labs**: Video understanding API
- **Roboflow**: Custom UI element detection
### 11. Success Metrics
#### 11.1 Accuracy Metrics
- Transcription accuracy: >95%
- Account detection rate: >90%
- Feature identification: >80%
- Error message extraction: >95%
#### 11.2 Performance Metrics
- Basic analysis: <30 seconds
- Full analysis: <2 minutes
- Key frame extraction: <10 seconds
- Caching hit rate: >60%
#### 11.3 Value Metrics
- Time saved per ticket: 10-15 minutes
- Additional context provided: 3-5 key insights
- Reproduction success rate: >80%
### 12. Future Enhancements
#### 12.1 Advanced Features
- **Multi-video Analysis**: Compare multiple videos for patterns
- **Automatic Bug Reproduction**: Generate Playwright/Selenium scripts
- **Sentiment Analysis**: Detect frustration levels
- **Performance Analysis**: Measure UI response times
#### 12.2 Integrations
- **Slack Notifications**: Alert on high-severity issues
- **Jira Sync**: Create detailed tickets automatically
- **Sentry Integration**: Link to error tracking
- **Datadog**: Correlate with performance metrics
## Implementation Priority
1. **High Priority**: Transcript extraction and account detection
2. **Medium Priority**: Visual analysis and error extraction
3. **Low Priority**: Advanced AI features and integrations
## Next Steps
1. Research Loom API capabilities and limitations
2. Prototype basic transcript extraction
3. Evaluate OCR services for accuracy
4. Design caching strategy
5. Create MVP with core features