MEDIA_SERVER.md•12.4 kB
# Media Server Documentation
The SCS-MCP Media Server provides screenshot capture, screen recording, and media management capabilities integrated with the voice assistant. This allows you to capture, annotate, and organize visual documentation of your coding sessions.
## Overview
The media server is a Node.js/Express application that:
- Captures screenshots and screen recordings
- Stores media with metadata and annotations
- Provides a web UI for viewing and managing media history
- Integrates with voice commands for hands-free capture
- Supports browser-based MCP tools for automation
## Architecture
```
┌─────────────────────────────────────────┐
│ Browser/Client │
│ ┌────────────┐ ┌─────────────────┐ │
│ │ Media UI │ │ Voice Assistant │ │
│ └─────┬──────┘ └────────┬────────┘ │
└────────┼──────────────────┼────────────┘
│ HTTP/WS │ WebSocket
┌────────▼──────────────────▼────────────┐
│ Media History Server │
│ ┌──────────────────────────────────┐ │
│ │ Express Server (Port 3000) │ │
│ │ WebSocket Server (Port 3001) │ │
│ └──────────────────────────────────┘ │
│ ┌──────────────────────────────────┐ │
│ │ Media Processing │ │
│ │ - Sharp (image processing) │ │
│ │ - FFmpeg (video processing) │ │
│ └──────────────────────────────────┘ │
└─────────────┬───────────────────────────┘
│
┌─────────▼──────────┐
│ Storage Layer │
│ ┌──────────────┐ │
│ │ SQLite DB │ │
│ │ (metadata) │ │
│ └──────────────┘ │
│ ┌──────────────┐ │
│ │ File System │ │
│ │ (media) │ │
│ └──────────────┘ │
└────────────────────┘
```
## Features
### 1. Screenshot Capture
#### Manual Capture
- Click the camera button in the UI
- Use keyboard shortcut (Ctrl+Shift+S / Cmd+Shift+S)
- Voice command: "Take a screenshot"
#### Automatic Capture
- Triggered by certain voice commands
- Captures context during code reviews
- Records visual state during debugging
#### Screenshot Features
- Full screen or region selection
- Automatic thumbnail generation
- Metadata tagging (project, context, timestamp)
- Instant preview with annotation tools
### 2. Screen Recording
#### Recording Controls
- Start/Stop recording button
- Voice commands: "Start recording" / "Stop recording"
- Automatic segmentation for long sessions
#### Recording Features
- WebM format with VP9 codec
- Configurable quality settings
- Real-time duration tracking
- Audio capture support (optional)
- Automatic compression and optimization
### 3. Media Management
#### Storage Organization
```
media/
├── screenshots/
│ ├── 1704816000000.png
│ ├── 1704816000000_thumb.png
│ └── ...
├── recordings/
│ ├── 1704816000000.webm
│ ├── 1704816000000_thumb.png
│ └── ...
└── exports/
└── session_1704816000000.zip
```
#### Database Schema
- **media_history**: Main media entries
- id, type, filename, thumbnail
- url, duration, size, dimensions
- project, context, annotations, tags
- timestamps
- **sessions**: Recording sessions
- start_time, end_time
- project, media_count
- total_duration
- **annotations**: Media annotations
- media_id, type (text, drawing, marker)
- content, position, style
- author, timestamp
### 4. Web UI Features
#### Gallery View
- Grid or list layout
- Thumbnail previews
- Quick actions (view, annotate, delete)
- Filtering and search
- Bulk operations
#### Detail View
- Full-size media display
- Annotation tools
- Metadata editing
- Export options
- Sharing capabilities
#### Analytics Dashboard
- Storage usage graphs
- Media type distribution
- Session timeline
- Activity heatmap
### 5. Annotation System
#### Drawing Tools
- Pen (freehand drawing)
- Highlighter (transparent overlay)
- Shapes (rectangle, circle, arrow)
- Text annotations
- Color picker
#### Annotation Features
- Layer management
- Undo/redo support
- Export with annotations
- Collaborative annotations (future)
## API Endpoints
### REST API
#### GET /api/media
Get media history with filtering
```javascript
// Query parameters
{
type: 'screenshot|recording',
project: 'project-name',
from: '2024-01-01',
to: '2024-01-31',
limit: 50,
offset: 0
}
```
#### POST /api/media/upload
Upload new media file
```javascript
// Multipart form data
{
file: File,
type: 'screenshot|recording',
project: 'project-name',
context: 'debugging session',
tags: ['bug', 'ui-issue']
}
```
#### GET /api/media/:id
Get specific media item with full metadata
#### PUT /api/media/:id
Update media metadata
```javascript
{
annotations: {...},
tags: [...],
context: '...'
}
```
#### DELETE /api/media/:id
Delete media item
#### GET /api/sessions
Get recording sessions
#### GET /api/analytics
Get usage analytics
### WebSocket Events
#### Client → Server
```javascript
// Request media history
{
type: 'request_media_history',
filters: {...}
}
// Start recording
{
type: 'start_recording',
project: 'current-project'
}
// Stop recording
{
type: 'stop_recording'
}
// Take screenshot
{
type: 'capture_screenshot',
region: 'full|selection'
}
```
#### Server → Client
```javascript
// Media history response
{
type: 'media_history',
data: [...]
}
// New media captured
{
type: 'media_captured',
data: {...}
}
// Recording status
{
type: 'recording_status',
isRecording: true,
duration: 120
}
```
## Integration with Voice Assistant
### Voice Commands
#### Screenshot Commands
- "Take a screenshot" - Captures current screen
- "Screenshot this error" - Captures with context
- "Capture the UI" - Captures browser/application
- "Save this for later" - Screenshot with reminder tag
#### Recording Commands
- "Start recording" - Begins screen recording
- "Stop recording" - Ends and saves recording
- "Pause recording" - Temporarily pauses
- "Record this debugging session" - Tagged recording
#### Management Commands
- "Show my screenshots" - Opens media gallery
- "Delete last screenshot" - Removes recent capture
- "Export today's captures" - Creates download bundle
- "Find screenshots from yesterday" - Time-based search
### Automatic Triggers
The system automatically captures media during:
- Error encounters (screenshot of error state)
- Successful test runs (celebration screenshot)
- Code review completion (summary screenshot)
- Important discoveries (context capture)
## Browser MCP Tools Integration
The media server includes MCP browser tools for automation:
### Available Tools
#### browser_take_screenshot
Captures screenshots programmatically
```javascript
{
tool: "browser_take_screenshot",
parameters: {
fullPage: true,
filename: "analysis-result.png"
}
}
```
#### browser_snapshot
Captures accessibility tree and DOM structure
```javascript
{
tool: "browser_snapshot",
parameters: {}
}
```
## Configuration
### Environment Variables
```env
# Media Server Configuration
MEDIA_STORAGE_PATH=./media
MAX_STORAGE_GB=5
THUMBNAIL_SIZE=300x200
# Server Ports
PORT=3000
WS_PORT=3001
# Video Settings
VIDEO_CODEC=vp9
VIDEO_QUALITY=80
VIDEO_FPS=30
MAX_RECORDING_MINUTES=30
# Cleanup Policy
AUTO_CLEANUP_DAYS=30
CLEANUP_WHEN_FULL=true
```
### Storage Management
#### Automatic Cleanup
- Configurable retention period
- Size-based cleanup when approaching limit
- Oldest files removed first
- Important items can be marked for keeping
#### Export Options
- Individual media export (PNG, WebM)
- Session bundle export (ZIP)
- Annotated vs clean versions
- Metadata included in exports
## Setup Instructions
### 1. Install Dependencies
```bash
cd voice-assistant
npm install
# Additional dependencies for media processing
npm install sharp fluent-ffmpeg @ffmpeg-installer/ffmpeg
```
### 2. Initialize Media Storage
```bash
# Create directory structure and database
npm run setup-media
# Or manually:
node setup-media.js
```
### 3. Start the Server
```bash
# Start media server with voice assistant
npm start
# Or standalone:
node media-history-server.js
```
### 4. Access the UI
Open browser to: http://localhost:3000/media-ui.html
## Usage Examples
### Example 1: Debugging Session Capture
```javascript
// Voice command flow
User: "Start recording my debugging session"
Assistant: "Recording started for debugging session"
// ... debugging work ...
User: "Take a screenshot of this error"
Assistant: "Screenshot captured and tagged with 'error'"
User: "Stop recording"
Assistant: "Recording saved: 5 minutes 32 seconds"
```
### Example 2: Code Review Documentation
```javascript
// Automated capture during review
User: "Review this function"
Assistant: [performs review]
System: [automatically captures screenshot of code]
System: [saves review summary with screenshot]
```
### Example 3: Bulk Export
```javascript
// Export for documentation
User: "Export all screenshots from this week"
Assistant: "Preparing export of 47 screenshots..."
System: [creates ZIP file with organized folders]
Assistant: "Export ready: weekly-screenshots-2024-01-08.zip"
```
## Troubleshooting
### Common Issues
#### Screenshots not capturing
- Check browser permissions for screen capture
- Ensure HTTPS or localhost origin
- Verify display capture API support
#### Recording fails to start
- Check FFmpeg installation: `ffmpeg -version`
- Verify sufficient disk space
- Check WebRTC support in browser
#### Media not loading
- Verify file permissions in media directory
- Check SQLite database integrity
- Ensure correct file paths in database
#### WebSocket connection fails
- Check firewall settings for port 3001
- Verify WebSocket support in proxy/reverse proxy
- Check for conflicting applications on ports
### Debug Mode
Enable debug logging:
```javascript
// In media-history-server.js
process.env.DEBUG = 'media:*';
```
View browser console for client-side debugging.
## Security Considerations
### Access Control
- Media files are served through Express static middleware
- Consider adding authentication for production use
- Implement user-based access control for multi-user setups
### Privacy
- Screenshots may contain sensitive information
- Recordings should be encrypted at rest
- Implement automatic redaction for sensitive data
- Regular cleanup of old media files
### Network Security
- Use HTTPS in production
- Implement CORS properly
- Validate file uploads
- Sanitize metadata inputs
## Future Enhancements
### Planned Features
- [ ] Cloud storage integration (S3, Google Cloud Storage)
- [ ] Real-time collaboration on annotations
- [ ] AI-powered content analysis and tagging
- [ ] Video editing capabilities
- [ ] GIF generation from recordings
- [ ] OCR text extraction from screenshots
- [ ] Integration with issue tracking systems
- [ ] Mobile app for remote capture
### API Extensions
- Batch operations API
- Advanced search with AI
- Webhook notifications
- Third-party integrations
## Related Documentation
- [Voice Assistant Setup](voice-assistant/README.md)
- [MCP Browser Tools](https://github.com/modelcontextprotocol/servers/tree/main/src/playwright)
- [Installation Guide](INSTALLATION.md)
- [API Reference](API.md)