youtube-context-mcp
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| YT_TRANSCRIPT_TIMEOUT | No | Per-request timeout in seconds. | 20 |
| WEBSHARE_PROXY_PASSWORD | No | Password for Webshare rotating residential proxies. | |
| WEBSHARE_PROXY_USERNAME | No | Username for Webshare rotating residential proxies. | |
| WEBSHARE_PROXY_LOCATIONS | No | Optional CSV of country codes for Webshare proxy, e.g. 'us,de'. | |
| YT_TRANSCRIPT_HTTP_PROXY | No | Generic HTTP proxy for transcript and metadata requests. | |
| YT_TRANSCRIPT_HTTPS_PROXY | No | Generic HTTPS proxy for transcript and metadata requests. |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| get_transcriptA | Fetch a YouTube video's existing captions as text so you can answer questions about it. Returns existing captions/subtitles only; it does not transcribe audio. Videos without captions have nothing to return. Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. languages: Preferred language codes in priority order. Defaults to ["en"]. include_timestamps: If true, group the transcript into ~15s blocks, each prefixed with [mm:ss] (or [h:mm:ss] past an hour). Use this to find where a topic is discussed and pass that [mm:ss] to build_video_link. translate_to: Optional ISO language code to translate the transcript into. Returns: The transcript as plain text. |
| build_video_linkA | Build a YouTube link that opens a video at a specific moment. Returns a watch URL like https://www.youtube.com/watch?v=&t= so a user can click straight to the moment something is discussed. Pair it with get_transcript(include_timestamps= True) -- read off the [mm:ss] of the relevant block and pass it as start -- to turn "where is X mentioned?" into a clickable link. Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. start: The moment to jump to -- seconds (e.g. 90) or a "mm:ss" / "h:mm:ss" string. Returns: The watch URL as a string. |
| list_transcriptsA | List the transcripts available for a YouTube video. Use this when get_transcript can't find your requested language. It reports each available transcript (language, code, whether it's auto-generated, whether it's translatable) plus the set of languages you can pass to get_transcript's translate_to. Args: video: A YouTube URL or an 11-character video ID. |
| get_video_metadataA | Get a YouTube video's metadata: title, channel, upload date, duration, view/like counts, chapters and tags. Use this to answer questions about a video (its name, who made it, how long it is, when it came out) without fetching its transcript. Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. include_description: If true, also return the (often long) description; otherwise it's omitted to keep the response small. |
| get_most_replayedA | Get a YouTube video's "most replayed" moments -- the peaks of its viewer-interest heatmap (the curve shown above the timeline marking where people rewatch most). Use this for "what are the best / most-rewatched parts?", "jump me to the good part", or to weight a summary toward what viewers actually care about. Each peak is a high-interest region (region_start_seconds..region_end_seconds) with the hottest instant at peak_start_seconds, a ready-to-share url that opens the video at the start of the stretch, and the chapter it falls in. relative_intensity is 0..1 within this video (1.0 = its single most-rewatched moment) -- it is NOT a view count and is not comparable across videos. A peak with is_opening=True sits at the very start (t~=0): that spot is almost always inflated by playback starting there, not a genuine rewatch, so discount it as a "best part". It is returned in addition to (not counted against) top_n, so the opening can't crowd out content. To say what is actually happening at a peak, read its peak_label (mm:ss) and look it up with get_transcript(include_timestamps=True); profile is a coarse 0..1 curve for the overall shape (front-loaded vs steady vs spikes near the end). has_data may be False -- then peaks is empty and note explains why (many newer, low-traffic, or Shorts videos have no heatmap). Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. top_n: Max number of content peak regions (clamped to 1..20; default 8). The flagged opening (t~=0) peak, when present, is returned in addition to these. |
| get_video_frameA | Capture a single still frame (screenshot) from a YouTube video at a moment and return it as an image, so a multimodal model can answer "what's on screen here?". Use this to see the video at a specific time -- e.g. read a slide, a caption burned into the video, or a UI being demoed. Pair it with get_most_replayed or get_transcript(include_timestamps =True) to pick an interesting moment, then grab the frame there. Requires ffmpeg on the server. The captured frame is the nearest keyframe at or just before the requested moment (it can be off by a second or two) and is downscaled to max_width to keep the response small. Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. at: The moment to capture -- seconds (e.g. 90) or a "mm:ss" / "h:mm:ss" string. max_width: Max width in pixels of the returned image (clamped 64..1280; default 640). Smaller is cheaper on a vision model's image-token budget. |
| get_video_previewA | Get a visual overview of a YouTube video as ONE tiled contact-sheet image: Use this to see what's going on across a video (talking head vs slides vs demo footage, scene changes, "is there a chart anywhere?") and to pick moments worth a closer look. To inspect one part in more detail, call it again with start/end around that part -- but pick the window from the transcript, chapters, or get_most_replayed first and zoom once; don't binary-search the video with repeated sheets, since every returned image stays in context. Read each tile's timestamp from the legend -- do not count grid cells yourself. Tiles are small and not readable: to read a slide, caption, or UI, follow up with get_video_frame(video, at=<that tile's timestamp>). Windows under ~1 minute may return near-duplicate tiles (frames land on keyframes, a few seconds apart). Requires ffmpeg on the server (the system binary, or the one bundled by the [media] extra). Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. tiles: How many frames to sample (clamped 4..24; default 12). tile_width: Width in pixels of each tile (clamped 160..480; default 320). The whole sheet stays around 1000-1300 px wide at the defaults -- cheap on a vision model's image budget while keeping tiles recognizable. start: Optional window start -- seconds (e.g. 90) or a "mm:ss" / "h:mm:ss" string. Defaults to the beginning of the video. end: Optional window end, same forms. Defaults to (and is clamped to) the video's end. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/realiti4/youtube-context-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server