The DeepSRT MCP Server enables YouTube video summarization through MCP integration.
Generate summaries: Create narrative or bullet-point summaries for YouTube videos
Multi-language support: Summarize videos in various languages (default:
zh-tw
)Caching mechanism: Fast retrieval through DeepSRT's CDN
Integration: Works seamlessly with MCP-enabled environments like Claude Desktop and Cline
Customizable parameters: Specify
videoId
,lang
, andmode
for tailored summariesMCP interface: Access via the
get_summary
tool in MCP clients
Generates summaries for YouTube videos, supporting both narrative and bullet-point formats with multi-language capabilities
DeepSRT MCP Server
A Model Context Protocol (MCP) server that provides YouTube video summarization and transcript extraction functionality through integration with DeepSRT's API and direct YouTube caption access.
TL;DR
Architecture
Sequence Flow
Technical Architecture
Core Components
1. MCP Server Layer
- Runtime Support: Both Node.js and Bun execution
- Protocol Handling: Model Context Protocol (MCP) request/response management
- Tool Registration:
get_summary
andget_transcript
tools - Error Handling: Comprehensive error management with user-friendly messages
2. Video Processing Pipeline
- URL Parser: Supports multiple YouTube URL formats and direct video IDs
- InnerTube Integration: Direct YouTube API access without API keys
- Caption Discovery: Automatic detection of available caption tracks
- Quality Selection: Prioritizes manual captions over auto-generated ones
3. Transcript Processing
- XML Parser: Handles YouTube's
<timedtext>
format - Entity Decoder: Converts HTML entities to readable text
- Timestamp Formatter: Converts milliseconds to
[MM:SS]
format - Content Filter: Removes music notation and empty segments
4. Summary Generation
- DeepSRT Integration: Direct API calls to
worker.deepsrt.com
- Multi-language Support: Supports zh-tw, en, ja, and other languages
- Mode Selection: Narrative and bullet-point summary formats
- Title Translation: Automatic title translation to target language
Key Features
No Pre-caching Required
- Works immediately for any YouTube video with captions
- Real-time transcript extraction and processing
- No dependency on external caching systems
Intelligent Caption Selection
- Priority Order: Manual > Auto-generated > Any available
- Language Preference: Respects user's preferred language
- Fallback Strategy: Graceful degradation to available options
Robust Error Handling
- Network timeout management (30-second timeout)
- API error translation to user-friendly messages
- Graceful handling of videos without captions
- Comprehensive validation of input parameters
Multi-format Output
- Markdown Formatting: Rich text with headers and metadata
- Structured Data: Video information, duration, author details
- Timestamped Transcripts: Precise timing information
- Localized Summaries: Content in user's preferred language
Performance Characteristics
- Fast Startup: < 1 second server initialization
- Efficient Processing: Parallel API calls for summary + title translation
- Memory Efficient: Streaming XML parsing, no large data buffering
- Network Optimized: Single request per video for metadata + captions
Recent Updates
v0.1.9 (Latest)
- ✅ Fixed critical test logic flaws: Tests now correctly validate API responses instead of checking non-existent
success
properties - ✅ Enhanced error handling: Improved graceful handling of YouTube API rate limiting (HTTP 429)
- ✅ Robust test suite: All 56 tests now pass consistently with proper error resilience
- ✅ Verified API integration: Confirmed both DeepSRT and YouTube APIs work correctly when not rate limited
- ✅ Multi-language support: Validated zh-tw, en, ja summary generation
- ✅ Production-ready: Test suite handles real-world API limitations professionally
v0.1.3
- ✅ Fixed CLI argument parsing: Now supports both
--key=value
and--key value
formats - ✅ Fixed bullet mode:
--mode bullet
now works correctly and generates bullet-point summaries - ✅ Improved bunx compatibility: Direct execution with
bunx @deepsrt/deepsrt-mcp
works without installation
v0.1.2
- ✅ Fixed CLI execution: Made CLI the default binary for direct npx/bunx execution
- ✅ Updated package configuration: Proper binary resolution for different execution methods
v0.1.1
- ✅ Added comprehensive testing: Unit tests, integration tests, and end-to-end tests
- ✅ Enhanced CLI tool: Full-featured command-line interface with help and examples
- ✅ Direct transcript extraction: No pre-caching required, works with any YouTube video
Features
- Generate summaries for YouTube videos
- Extract full transcripts with timestamps from YouTube videos
- Support for both narrative and bullet-point summary modes
- Multi-language support (default: zh-tw)
- Direct YouTube caption access (no API key required)
- Seamless integration with MCP-enabled environments
How it Works
Summary Generation
- Direct YouTube Integration
- Extracts video information and captions directly from YouTube using the InnerTube API
- Fetches transcript content from YouTube's caption system
- Sends transcript data to DeepSRT API for summarization
- Real-time Processing
- No pre-caching required - works immediately for any video with captions
- Automatically selects the best available captions (manual preferred over auto-generated)
- Supports multiple languages and summary modes
Transcript Extraction
- Direct YouTube Access
- Transcripts are extracted directly from YouTube's caption system using the InnerTube API
- No pre-caching required - works immediately for any video with captions
- Caption Selection
- Automatically selects the best available captions (manual captions preferred over auto-generated)
- Supports language preference selection
- Falls back gracefully to available alternatives
- Timestamp Formatting
- Provides clean, formatted transcripts with timestamps in [MM] format
- Handles both manual and auto-generated captions
- Includes video metadata and caption information
CLI Usage
The DeepSRT MCP server provides a unified interface that handles both MCP server mode and CLI commands.
Unified Interface
Direct CLI Commands (No Installation Required)
Global Installation
For easier access, install globally:
CLI Options
get-transcript
get-summary
Supported URL Formats
The CLI accepts multiple YouTube URL formats:
CLI Features
- Direct execution: No installation required with
bunx
- Multiple URL formats: Full URLs, short URLs, or direct video IDs
- Flexible argument formats: Both
--key=value
and--key value
formats supported - Language support: Specify target language for summaries and transcript preferences
- Summary modes: Choose between narrative or bullet-point formats
- Rich output: Colored console output with progress indicators
- Error handling: Clear error messages with suggestions
Example Output
Transcript Output
Summary Output (Narrative Mode)
Summary Output (Bullet Mode)
Installation
Option 1: Direct Usage with bunx (Recommended - No Installation Required)
Use the unified interface directly without any installation:
Option 2: Global Installation (For Frequent Use)
Option 3: Installing for Claude Desktop (Recommended)
Add this configuration to your Claude Desktop config file:
- On macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- On Windows:
%APPDATA%/Claude/claude_desktop_config.json
This approach:
- ✅ No local installation required
- ✅ Always uses latest version
- ✅ Automatic updates when you restart Claude
- ✅ Cross-platform compatibility
- ✅ Simple and clean configuration
Option 4: Installing for Cline
Add this configuration to your cline_mcp_settings.json
:
Or just ask Cline to install in the chat:
"Hey, install this MCP server for me from https://github.com/DeepSRT/deepsrt-mcp"
Option 5: Using bunx (Direct Execution)
You can run the server directly with bunx without installation:
Usage
MCP Integration
The server provides the following tools for MCP clients:
get_summary
Gets a summary for a YouTube video.
Parameters:
videoId
(required): YouTube video IDlang
(optional): Language code (e.g., zh-tw) - defaults to zh-twmode
(optional): Summary mode ("narrative" or "bullet") - defaults to narrative
get_transcript
Gets a transcript for a YouTube video with timestamps.
Parameters:
videoId
(required): YouTube video ID or full YouTube URLlang
(optional): Preferred language code for captions (e.g., en, zh-tw) - defaults to en
Example Usage
Using Claude Desktop:
Using Cline:
Development
Install dependencies:
Running Tests
Test Types:
- Unit Tests (
src/index.test.ts
,src/integration.test.ts
) - Fast tests with mocked data - Network Tests (
src/transcript.test.ts
,src/e2e.test.ts
) - Real YouTube API integration tests
Examples
See the examples/
directory for reference implementations:
examples/standalone-summarizer.ts
- Standalone script showing direct API usage patterns
Running the Server
With Bun (Recommended for development - faster startup):
With Node.js (Production):
Testing the Server
You can test the server using the MCP inspector:
Or test manually with JSON-RPC:
Build for Production
Build for production:
Watch mode for development:
Demo
FAQ
Q: I am getting 404
error, why?
A: This is because the video summary is not cached in the CDN edge location, you need to open this video using DeepSRT chrome extension to have it cached in the CDN network before you can get that summary using MCP.
You can verify the cache status using cURL like this
If you see cache-status: HIT
the content is cached in the CDN edge location and your MCP server shoud not get 404
.
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
An MCP server that enables users to generate summaries of YouTube videos in multiple languages and formats through integration with DeepSRT's API.
Related Resources
Related MCP Servers
- -securityFlicense-qualityThe MCP Server integrates APIs from the Youtube-Summarizer as tools within the MCP protocol, allowing for local AI application interaction and tool utilization through natural language queries.Last updated -
- -securityFlicense-qualityAn MCP server that enables LLMs to search YouTube, retrieve video information, and access video transcripts through standardized tools.Last updated -
- -securityFlicense-qualitySimple MCP server that returns the transcription of a Youtube video using url and desired language.Last updated -
- AsecurityFlicenseAqualityAn MCP server that provides AI assistants with powerful tools to interact with YouTube, including video searching, transcript extraction, comment retrieval, and more.Last updated -818