Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
YT_TRANSCRIPT_TIMEOUTNoPer-request timeout in seconds.20
WEBSHARE_PROXY_PASSWORDNoPassword for Webshare rotating residential proxies.
WEBSHARE_PROXY_USERNAMENoUsername for Webshare rotating residential proxies.
WEBSHARE_PROXY_LOCATIONSNoOptional CSV of country codes for Webshare proxy, e.g. 'us,de'.
YT_TRANSCRIPT_HTTP_PROXYNoGeneric HTTP proxy for transcript and metadata requests.
YT_TRANSCRIPT_HTTPS_PROXYNoGeneric HTTPS proxy for transcript and metadata requests.

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
get_transcriptA

Fetch a YouTube video's existing captions as text so you can answer questions about it.

Returns existing captions/subtitles only; it does not transcribe audio. Videos without captions have nothing to return.

Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. languages: Preferred language codes in priority order. Defaults to ["en"]. include_timestamps: If true, group the transcript into ~15s blocks, each prefixed with [mm:ss] (or [h:mm:ss] past an hour). Use this to find where a topic is discussed and pass that [mm:ss] to build_video_link. translate_to: Optional ISO language code to translate the transcript into.

Returns: The transcript as plain text.

build_video_linkA

Build a YouTube link that opens a video at a specific moment.

Returns a watch URL like https://www.youtube.com/watch?v=&t= so a user can click straight to the moment something is discussed. Pair it with get_transcript(include_timestamps= True) -- read off the [mm:ss] of the relevant block and pass it as start -- to turn "where is X mentioned?" into a clickable link.

Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. start: The moment to jump to -- seconds (e.g. 90) or a "mm:ss" / "h:mm:ss" string.

Returns: The watch URL as a string.

list_transcriptsA

List the transcripts available for a YouTube video.

Use this when get_transcript can't find your requested language. It reports each available transcript (language, code, whether it's auto-generated, whether it's translatable) plus the set of languages you can pass to get_transcript's translate_to.

Args: video: A YouTube URL or an 11-character video ID.

get_video_metadataA

Get a YouTube video's metadata: title, channel, upload date, duration, view/like counts, chapters and tags.

Use this to answer questions about a video (its name, who made it, how long it is, when it came out) without fetching its transcript.

Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. include_description: If true, also return the (often long) description; otherwise it's omitted to keep the response small.

get_most_replayedA

Get a YouTube video's "most replayed" moments -- the peaks of its viewer-interest heatmap (the curve shown above the timeline marking where people rewatch most).

Use this for "what are the best / most-rewatched parts?", "jump me to the good part", or to weight a summary toward what viewers actually care about. Each peak is a high-interest region (region_start_seconds..region_end_seconds) with the hottest instant at peak_start_seconds, a ready-to-share url that opens the video at the start of the stretch, and the chapter it falls in. relative_intensity is 0..1 within this video (1.0 = its single most-rewatched moment) -- it is NOT a view count and is not comparable across videos.

A peak with is_opening=True sits at the very start (t~=0): that spot is almost always inflated by playback starting there, not a genuine rewatch, so discount it as a "best part". It is returned in addition to (not counted against) top_n, so the opening can't crowd out content.

To say what is actually happening at a peak, read its peak_label (mm:ss) and look it up with get_transcript(include_timestamps=True); profile is a coarse 0..1 curve for the overall shape (front-loaded vs steady vs spikes near the end).

has_data may be False -- then peaks is empty and note explains why (many newer, low-traffic, or Shorts videos have no heatmap).

Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. top_n: Max number of content peak regions (clamped to 1..20; default 8). The flagged opening (t~=0) peak, when present, is returned in addition to these.

get_video_frameA

Capture a single still frame (screenshot) from a YouTube video at a moment and return it as an image, so a multimodal model can answer "what's on screen here?".

Use this to see the video at a specific time -- e.g. read a slide, a caption burned into the video, or a UI being demoed. Pair it with get_most_replayed or get_transcript(include_timestamps =True) to pick an interesting moment, then grab the frame there.

Requires ffmpeg on the server. The captured frame is the nearest keyframe at or just before the requested moment (it can be off by a second or two) and is downscaled to max_width to keep the response small.

Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. at: The moment to capture -- seconds (e.g. 90) or a "mm:ss" / "h:mm:ss" string. max_width: Max width in pixels of the returned image (clamped 64..1280; default 640). Smaller is cheaper on a vision model's image-token budget.

get_video_previewA

Get a visual overview of a YouTube video as ONE tiled contact-sheet image: tiles frames sampled evenly across the video (or across a start..end window of it), plus a text legend mapping each tile to its mm:ss timestamp.

Use this to see what's going on across a video (talking head vs slides vs demo footage, scene changes, "is there a chart anywhere?") and to pick moments worth a closer look. To inspect one part in more detail, call it again with start/end around that part -- but pick the window from the transcript, chapters, or get_most_replayed first and zoom once; don't binary-search the video with repeated sheets, since every returned image stays in context. Read each tile's timestamp from the legend -- do not count grid cells yourself. Tiles are small and not readable: to read a slide, caption, or UI, follow up with get_video_frame(video, at=<that tile's timestamp>). Windows under ~1 minute may return near-duplicate tiles (frames land on keyframes, a few seconds apart).

Requires ffmpeg on the server (the system binary, or the one bundled by the [media] extra).

Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. tiles: How many frames to sample (clamped 4..24; default 12). tile_width: Width in pixels of each tile (clamped 160..480; default 320). The whole sheet stays around 1000-1300 px wide at the defaults -- cheap on a vision model's image budget while keeping tiles recognizable. start: Optional window start -- seconds (e.g. 90) or a "mm:ss" / "h:mm:ss" string. Defaults to the beginning of the video. end: Optional window end, same forms. Defaults to (and is clamped to) the video's end.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/realiti4/youtube-context-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server