Skip to main content
Glama

YouTube Transcript DL MCP Server

by jedarden
MIT License
0
2
README.md8.79 kB
# 🎬 YouTube Transcript DL MCP Server A comprehensive MCP (Model Context Protocol) server for extracting YouTube video transcripts with support for multiple transports (stdio, SSE, HTTP), Docker deployment, and npm package distribution. ## ✨ Features - 🎯 **Multiple Transport Support**: stdio, Server-Sent Events (SSE), and HTTP - 📹 **Comprehensive Transcript Extraction**: Single videos, bulk processing, and playlists - 🌍 **Multi-language Support**: Extract transcripts in different languages - 📝 **Multiple Output Formats**: Text, JSON, and SRT subtitle formats - 🚀 **High Performance**: Built-in caching and rate limiting - 🐳 **Docker Ready**: Full containerization support - 📦 **npm Package**: Easy installation and distribution - 🧪 **Test-Driven Development**: Comprehensive test suite with 90%+ coverage - 🔧 **TypeScript**: Full type safety and modern JavaScript features ## 📦 Installation ### 🔧 As an npm package ```bash npm install -g yt-transcript-dl-mcp ``` ### 🛠️ From source ```bash git clone <repository-url> cd yt-transcript-dl-repo npm install npm run build ``` ### 🐳 Docker ```bash # From GitHub Container Registry (recommended) docker pull ghcr.io/jedarden/yt-transcript-dl-mcp:latest docker run -p 3001:3001 -p 3002:3002 ghcr.io/jedarden/yt-transcript-dl-mcp:latest --multi-transport # Build from source docker build -t yt-transcript-dl-mcp . docker run -p 3001:3001 -p 3002:3002 yt-transcript-dl-mcp --multi-transport ``` ## 🚀 Usage ### 🖥️ MCP Server Start the MCP server in different modes: ```bash # Stdio mode (default) yt-transcript-dl-mcp start # SSE mode yt-transcript-dl-mcp start --transport sse --port 3000 # HTTP mode yt-transcript-dl-mcp start --transport http --port 3000 # With verbose logging yt-transcript-dl-mcp start --verbose ``` ### 💻 CLI Tool Test the server with a sample video: ```bash # Test with a YouTube video yt-transcript-dl-mcp test dQw4w9WgXcQ # Test with different language yt-transcript-dl-mcp test dQw4w9WgXcQ --language es # Test with different format yt-transcript-dl-mcp test dQw4w9WgXcQ --format srt ``` ### 🔧 Programmatic Usage ```typescript import { YouTubeTranscriptService } from 'yt-transcript-dl-mcp'; const service = new YouTubeTranscriptService(); // Extract single video transcript const result = await service.getTranscript('dQw4w9WgXcQ', 'en', 'json'); console.log(result); // Bulk processing const bulkResult = await service.getBulkTranscripts({ videoIds: ['dQw4w9WgXcQ', 'jNQXAC9IVRw'], outputFormat: 'json', language: 'en' }); console.log(bulkResult); ``` ## 🛠️ MCP Tools The server provides the following MCP tools: ### `get_transcript` Extract transcript from a single YouTube video. **Parameters:** - `videoId` (required): YouTube video ID or URL - `language` (optional): Language code (default: 'en') - `format` (optional): Output format - 'text', 'json', or 'srt' (default: 'json') ### `get_bulk_transcripts` Extract transcripts from multiple YouTube videos. **Parameters:** - `videoIds` (required): Array of YouTube video IDs or URLs - `language` (optional): Language code (default: 'en') - `outputFormat` (optional): Output format - 'text', 'json', or 'srt' (default: 'json') - `includeMetadata` (optional): Include metadata in response (default: true) ### `get_playlist_transcripts` Extract transcripts from all videos in a YouTube playlist. **Parameters:** - `playlistId` (required): YouTube playlist ID or URL - `language` (optional): Language code (default: 'en') - `outputFormat` (optional): Output format - 'text', 'json', or 'srt' (default: 'json') - `includeMetadata` (optional): Include metadata in response (default: true) ### `format_transcript` Format existing transcript data into different formats. **Parameters:** - `transcript` (required): Transcript data array - `format` (required): Output format - 'text', 'json', or 'srt' ### `get_cache_stats` Get cache statistics and performance metrics. ### `clear_cache` Clear the transcript cache. ## ⚙️ Configuration ### 🌍 Environment Variables ```bash # Server configuration PORT=3000 HOST=0.0.0.0 MCP_TRANSPORT=stdio # CORS settings CORS_ENABLED=true CORS_ORIGINS=* # Rate limiting RATE_LIMIT_WINDOW=900000 # 15 minutes in ms RATE_LIMIT_MAX=100 # Caching CACHE_ENABLED=true CACHE_TTL=3600 # 1 hour in seconds CACHE_MAX_SIZE=1000 # Logging LOG_LEVEL=info LOG_FORMAT=simple ``` ### 📝 Configuration File Create a `config.json` file: ```json { "port": 3000, "host": "0.0.0.0", "cors": { "enabled": true, "origins": ["*"] }, "rateLimit": { "windowMs": 900000, "max": 100 }, "cache": { "enabled": true, "ttl": 3600, "maxSize": 1000 }, "logging": { "level": "info", "format": "simple" } } ``` ## 🐳 Docker Deployment ### 🐙 Docker Compose ```yaml version: '3.8' services: yt-transcript-mcp: build: . ports: - "3000:3000" environment: - NODE_ENV=production - PORT=3000 - LOG_LEVEL=info restart: unless-stopped healthcheck: test: ["CMD", "node", "dist/health-check.js"] interval: 30s timeout: 10s retries: 3 ``` ### Health Checks The Docker container includes built-in health checks: ```bash # Check container health docker ps docker exec <container-id> node dist/health-check.js ``` ## Development ### Setup ```bash git clone <repository-url> cd yt-transcript-dl-repo npm install ``` ### Running Tests ```bash # Run all tests npm test # Run tests with coverage npm run test:coverage # Run specific test suites npm run test:unit npm run test:integration npm run test:e2e # Watch mode npm run test:watch ``` ### Building ```bash # Build TypeScript npm run build # Development mode with watch npm run dev # Linting npm run lint npm run lint:fix ``` ### Testing the MCP Server ```bash # Test stdio transport ./scripts/test-stdio.sh # Test with sample video npm run test:sample ``` ## API Documentation ### Response Format All transcript responses follow this structure: ```typescript interface TranscriptResponse { videoId: string; title?: string; language: string; transcript: TranscriptItem[]; metadata?: { extractedAt: string; source: string; duration?: number; error?: string; }; } interface TranscriptItem { text: string; start: number; duration: number; } ``` ### Error Handling The server handles various error scenarios: - **Video not found**: Returns empty transcript with error in metadata - **Private videos**: Graceful error handling with descriptive messages - **Rate limiting**: Built-in delays and retry logic - **Network errors**: Automatic retries with exponential backoff ## Performance ### Benchmarks - **Single video extraction**: < 5 seconds - **Bulk processing**: < 2 seconds per video - **Concurrent requests**: 90%+ success rate for 10 concurrent requests - **Memory usage**: < 512MB under normal load - **Cache hit ratio**: 70%+ for repeated requests ### Optimization - **LRU Cache**: Configurable TTL and size limits - **Rate Limiting**: Prevents API abuse - **Concurrent Processing**: Optimized for bulk operations - **Memory Management**: Efficient garbage collection ## Troubleshooting ### Common Issues 1. **Video not found**: Check if video is public and has captions 2. **Rate limiting**: Reduce concurrent requests or increase delays 3. **Memory issues**: Reduce cache size or clear cache regularly 4. **Network errors**: Check internet connection and firewall settings ### Debug Mode Enable debug logging: ```bash export LOG_LEVEL=debug yt-transcript-dl-mcp start --verbose ``` ### Logs Check logs in the `logs/` directory: ```bash tail -f logs/combined.log tail -f logs/error.log ``` ## Contributing 1. Fork the repository 2. Create a feature branch 3. Write tests for new functionality 4. Ensure all tests pass 5. Submit a pull request ### Code Style - Use TypeScript for all code - Follow ESLint configuration - Write comprehensive tests - Add JSDoc comments for public APIs - Use conventional commit messages ## License MIT License - see LICENSE file for details. ## Support - GitHub Issues: [Report bugs and feature requests](https://github.com/your-username/yt-transcript-dl-mcp/issues) - Documentation: [Full API documentation](https://github.com/your-username/yt-transcript-dl-mcp/wiki) - Examples: [Usage examples](https://github.com/your-username/yt-transcript-dl-mcp/tree/main/examples) ## Changelog ### v1.0.0 - Initial release - MCP server with stdio, SSE, and HTTP transports - Single video and bulk transcript extraction - Docker containerization - Comprehensive test suite - TypeScript support - Caching and rate limiting - Multiple output formats (text, JSON, SRT)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jedarden/yt-transcript-dl-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server