Allows fetching video transcripts and downloading video content directly from the platform.
Utilizes OpenAI's Whisper API to perform AI-powered speech-to-text for generating subtitles and transcriptions from video audio.
Enables downloading video content and generating subtitles using AI-powered speech-to-text tools.
Provides capabilities to download video content and generate transcriptions or subtitles for Twitch videos.
Supports downloading videos and extracting existing transcripts from the Vimeo platform.
Provides tools to retrieve transcripts in multiple languages, list available subtitle tracks, and download videos in various qualities and formats.
Video Toolkit MCP Server
A Model Context Protocol (MCP) server that provides comprehensive video tools: transcript retrieval, video downloading, and automatic subtitle generation using AI speech-to-text. Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp.
Features
Multi-Platform Support: Works with YouTube, Bilibili, Vimeo, and any platform supported by yt-dlp
Video Transcripts: Extract existing transcripts/captions from videos
Video Downloads: Download videos to local storage in various formats and qualities
Auto Subtitle Generation: Generate subtitles using OpenAI Whisper API or local Whisper
Multiple URL Formats: Support for various URL formats from different platforms
Timestamp Support: Include or exclude timestamps in transcript output
Language Selection: Request transcripts or generate subtitles in specific languages
Tools
Tool | Description |
| Retrieve existing transcripts from video platforms |
| List available transcript languages for a video |
| Download videos to local storage |
| List downloaded video files |
| Generate subtitles using AI speech-to-text |
Prerequisites
Node.js >= 16.0.0
yt-dlp - Required for transcript fetching and video downloads
ffmpeg - Required for subtitle generation (audio extraction)
Installing Dependencies
yt-dlp (required):
# Using Homebrew (macOS)
brew install yt-dlp
# Using pip
pip install yt-dlpffmpeg (required for subtitle generation):
# Using Homebrew (macOS)
brew install ffmpeg
# Using apt (Ubuntu/Debian)
sudo apt install ffmpegLocal Whisper (optional, for local subtitle generation):
pip install openai-whisperInstallation
From Source
git clone <repository-url>
cd video-toolkit-mcp
npm install
npm run buildGlobal Installation (after publishing)
npm install -g video-toolkit-mcpConfiguration
For Claude Desktop / Cursor
Add the MCP server to your configuration file:
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"video-toolkit-mcp": {
"command": "node",
"args": ["/path/to/video-toolkit-mcp/dist/index.js"],
"env": {
"VIDEO_TOOLKIT_STORAGE_DIR": "/path/to/downloads",
"OPENAI_API_KEY": "your-openai-api-key"
}
}
}
}Cursor (~/.cursor/mcp.json):
{
"mcpServers": {
"video-toolkit-mcp": {
"command": "node",
"args": ["/path/to/video-toolkit-mcp/dist/index.js"],
"env": {
"VIDEO_TOOLKIT_STORAGE_DIR": "/path/to/downloads",
"OPENAI_API_KEY": "your-openai-api-key"
}
}
}
}Environment Variables
Variable | Description | Default |
| Default directory for downloaded videos |
|
| OpenAI API key for Whisper-based subtitle generation | None |
| Preferred whisper engine: |
|
| Path to local whisper binary |
|
| Path to whisper model (for local whisper) | Auto-download |
| Path to yt-dlp binary |
|
| Path to ffmpeg binary |
|
| Enable debug logging |
|
Usage
1. get-transcript
Retrieve existing transcripts from video platforms.
Parameters:
url(required): Video URLlang(optional): Language code (e.g., 'en', 'es', 'zh')include_timestamps(optional): Include timestamps (default: true)
Example:
Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID2. list-transcript-languages
List available transcript languages for a video.
Parameters:
url(required): Video URL
Example:
What transcript languages are available for https://www.youtube.com/watch?v=VIDEO_ID?3. download-video
Download a video to local storage.
Parameters:
url(required): Video URL to downloadoutput_dir(optional): Custom output directoryfilename(optional): Custom filenameformat(optional): Video format -mp4,webm,mkv(default: mp4)quality(optional): Quality -best,1080p,720p,480p,360p,audio(default: best)
Example:
Download this video: https://www.youtube.com/watch?v=VIDEO_ID4. list-downloads
List all downloaded video files.
Parameters:
directory(optional): Directory to list (default: storage directory)
Example:
List my downloaded videos5. generate-subtitles
Generate subtitles for a local video file using AI speech-to-text.
Parameters:
video_path(required): Absolute path to the video fileengine(optional):openaiorlocal(default: auto-detect)language(optional): Language code for transcriptionoutput_format(optional):srtorvtt(default: srt)
Example:
Generate subtitles for /path/to/video.mp4Subtitle Generation Engines
OpenAI Whisper API
Pros: High accuracy, no local setup needed, supports 50+ languages
Cons: Requires API key, costs per audio minute
Setup: Set
OPENAI_API_KEYenvironment variable
Local Whisper
Pros: Free, runs locally, no API limits
Cons: Requires setup, uses local CPU/GPU
Setup:
pip install openai-whisper
The tool auto-detects which engine to use:
If
OPENAI_API_KEYis set, uses OpenAI WhisperIf local whisper is installed, uses local whisper
Returns an error if neither is available
Example Workflows
Download and Generate Subtitles
1. Download this video: https://www.youtube.com/watch?v=VIDEO_ID
2. Generate subtitles for the downloaded fileSummarize a Video
Get the transcript from https://www.youtube.com/watch?v=VIDEO_ID and summarize the key pointsCreate Captions for Videos Without Subtitles
1. Download the video: https://vimeo.com/123456789
2. Generate English subtitles for itSupported Platforms
Any platform supported by yt-dlp, including:
YouTube
Bilibili
Vimeo
Twitter/X
TikTok
Twitch
And many more...
Full list: https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md
Project Structure
video-toolkit-mcp/
├── src/
│ ├── index.ts # Main MCP server entry point
│ ├── transcript-fetcher.ts # Transcript fetching using yt-dlp
│ ├── video-downloader.ts # Video download functionality
│ ├── subtitle-generator.ts # AI-powered subtitle generation
│ ├── config.ts # Configuration management
│ ├── url-detector.ts # Platform detection from URLs
│ ├── parser.ts # Transcript parsing (SRT, VTT, JSON)
│ └── errors.ts # Custom error classes
├── test/
│ └── transcript.test.ts # Unit tests
├── dist/ # Compiled JavaScript (after build)
└── package.jsonDevelopment
# Build
npm run build
# Test
npm test
# Development mode
npm run devTroubleshooting
"yt-dlp is not installed"
brew install yt-dlp
# or
pip install yt-dlp"ffmpeg is not installed"
brew install ffmpeg"No Whisper engine available"
Either:
Set
OPENAI_API_KEYenvironment variable, orInstall local whisper:
pip install openai-whisper
Download issues
Check if the video is publicly accessible
Some platforms may have rate limits
Private/restricted videos cannot be downloaded
Subtitle generation is slow
OpenAI Whisper API is faster than local
Local whisper performance depends on your hardware
Consider using a smaller model for local whisper
License
MIT
Acknowledgments
yt-dlp for video platform support
OpenAI Whisper for speech-to-text
Model Context Protocol for the MCP framework