Weftly
Server Details
Find & cut horizontal and vertical video clips (Shorts/Reels), transcribe & summarize. Pay per job.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.5/5 across 11 of 11 tools scored. Lowest: 3.9/5.
Tools have distinct purposes, with clear descriptions guiding when to use each. Some overlap exists between transcribe, summarize, and find_clips (all produce transcripts), but explicit warnings prevent confusion.
Most tools follow verb_noun pattern (e.g., extract_clip, find_clips), but some are standalone verbs (summarize, transcribe) and one is a noun phrase (mpp_smoke_test). Inconsistent style across the set.
11 tools is well-scoped for a media-processing MCP, covering upload, transcription, summarization, clip extraction, YouTube publishing, and status polling. No obvious bloat or deficiency.
Covers core workflows: upload, transcribe, summarize, clip extraction, YouTube publishing, and payment testing. Minor gaps like deletion or batch operations are acceptable for the domain.
Available Tools
11 toolscomplete_uploadAInspect
Confirm that the file has been uploaded (via HTTP PUT to the upload_url from transcribe or summarize) and start processing. Verifies that the file is present in storage and that the job has been paid. Returns status "processing". Poll get_job_status to track progress and retrieve download URLs when done.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | The job_id returned from a previous transcribe or summarize call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description must disclose behavior. It describes checks (file present, job paid) and return status, but does not mention failure modes, side effects on failure, or authorization requirements. Adequate but not comprehensive.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, each informative. Front-loaded with the main action, no superfluous words. Efficient and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers prerequisites, checks, return value, and next steps. Lacks details on error handling or timeouts, but for a simple confirmation step with no output schema, it is largely complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with job_id already described. Description reiterates the source ('from a previous transcribe or summarize call') but adds no new semantic value beyond the schema, so baseline score of 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool confirms upload and starts processing, with specific verb 'Confirm' and resource 'file upload'. Distinguishes from siblings like transcribe/summarize by referencing their upload_url.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context: use after HTTP PUT to upload_url from transcribe/summarize, and directs to poll get_job_status. Does not explicitly state when not to use or list alternatives, but context is sufficient.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
extract_clipAInspect
Cut and assemble a clip from any prior video job (find_clips, summarize, or video transcribe). Operates on a parent job — possessing the parent source_job_id is the capability, no upload step. Pass one segment for a simple cut, or multiple non-contiguous segments to compose a single mp4 highlight reel — same flat $0.50 either way. Two-call flow: (1) call with source_job_id + segments (ordered array of {start, end, label?} in source seconds, total duration capped at 30 minutes) to receive {job_id, payment_challenge}; (2) pay via MPP and call with job_id + payment_credential to start processing. No upload step. Poll get_job_status(job_id) for completion; outputs are role clip-video (the assembled .mp4, frame-accurate boundaries with 15ms audio fades at segment joins; audio loudness-normalized to -14 LUFS / -1.5 dBTP for clean, consistent playback) and — when include_transcript: true (default) — roles clip-srt + clip-words (transcripts stitched and time-shifted to match the assembled video). Set include_transcript: false to skip transcript outputs. Payment: pay by credit card via the Stripe Checkout link (open the returned payment_url in any browser) or Tempo USDC via mppx; the challenge's WWW-Authenticate header and /.well-known/mpp.json are authoritative for which methods are offered. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check expires_at from get_job_status on the parent). Multiple extract_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Optional title for the assembled clip. Surfaces in get_job_status and download filenames; doesn't affect the cut itself. | |
| job_id | No | Job ID returned from a previous extract_clip call. Include along with payment_credential to confirm payment and trigger processing. Also include alone to recover the current state. | |
| segments | No | Ordered array of source-relative segments to cut and concatenate into the output. Single segment for a simple cut; multiple segments compose a single mp4 from non-contiguous moments — same flat $0.50 either way. Total summed duration capped at 30 minutes per call. Required on the first call. | |
| source_job_id | No | Job ID of any prior video job (find_clips, summarize, or video transcribe). Possessing this id is the capability — extract_clip is not session-bound, so a user can come back from a different session within the parent's TTL and still extract. Required on the first call. | |
| include_transcript | No | Default true. When true, the pipeline writes clip-srt + clip-words outputs stitched to match the assembled video. Set false to skip and just receive the .mp4. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ..."). extract_clip accepts Tempo USDC and Stripe SPT — see the challenge's WWW-Authenticate header or /.well-known/mpp.json for the supported methods. Include with job_id after paying the challenge to start processing. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description fully bears the transparency burden. It discloses the two-call flow, payment requirement, output roles, audio processing (fades, loudness normalization), transcript option, TTL, independent jobs, and auto-refund on failure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is long but well-structured and front-loaded with the core purpose. It uses bullet-like dashes for clarity, but some redundancy could be trimmed. Overall efficient for the amount of detail.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite no output schema, the description covers payment flow, output roles, TTL, limits, independent calls, and error handling. It thoroughly addresses the complexity of the tool, leaving minimal gaps for an AI agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so baseline is 3. The description adds significant context: segments assembly, payment credential use, default behavior for include_transcript, and title purpose, enhancing comprehension beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Cut and assemble a clip from any prior video job (find_clips, summarize, or video transcribe).' It differentiates from sibling tools like 'extract_vertical_clip' by specifying it works on any source video job and produces an MP4.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides extensive usage context: it operates on a parent job, requires no upload, details a two-call flow, payment methods, and TTL limits. It implicitly differentiates from siblings but does not explicitly state when not to use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
extract_vertical_clipAInspect
Cut a 9:16 vertical clip from any prior video job (find_clips, summarize, or video transcribe), suitable for direct upload to TikTok, Instagram Reels, or YouTube Shorts. Default output is 1080×1920 H.264 / AAC .mp4 with center-cropped framing; audio loudness-normalized to -14 LUFS / -1.5 dBTP for short-form social. Single-segment only; clip duration must be between 1 and 90 seconds (Instagram Reels max). Operates on a parent job — possessing the parent source_job_id is the capability, no upload step. Two-call flow: (1) call with source_job_id + start + end (in source seconds) to receive {job_id, payment_challenge}; (2) pay via MPP and call with job_id + payment_credential to start processing. Poll get_job_status(job_id) for completion; output is role clip-vertical-video (the .mp4). Flat price: $0.50 per clip. Payment: pay by credit card via the Stripe Checkout link (open the returned payment_url in any browser) or Tempo USDC via mppx. Optional profile parameter selects the encoding profile (default tiktok-primary). Allowed values: tiktok-primary (1080×1920, fast preset, CRF 22), tiktok-primary-720p (720×1280, CBR 3 Mbps — half-resolution mobile-optimized, ~40% faster wall time), instagram-reels (1080×1920, slow preset, CBR 4 Mbps), instagram-stories (same encode shape as instagram-reels). All four profiles loudness-normalize identically. Optional subject parameter controls reframing (default center, preserves today's behavior): auto locks onto the longest-tracked face from the parent's subjects-sidecar (or runs inline detection if the parent has none); subject_id (with subject_id param naming a face_N from the sidecar) locks onto a specific subject; follow switches crop between active speakers across the clip using the sidecar's active_speaker_timeline; manual accepts caller-supplied framing via subject_box: {x, y, w, h} (source pixels) or subject_x_offset (direct crop x). Sidecar shape at /.well-known/weftly-subjects-v1.schema.json. auto/subject_id/follow fall back to center if detection or sidecar resolution fails — the paid job always delivers a clip. Source must be a horizontal video (wider than 9:16) — already-vertical or square sources are rejected. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check expires_at from get_job_status on the parent). Pair with find_clips ($2.00/video) to pick a moment first, then call this to get a download-ready vertical mp4 in under 5 minutes. Multiple extract_vertical_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| end | No | Source-relative end time in seconds (must be > start, and end - start ∈ [1, 90]). Required on the first call. | |
| start | No | Source-relative start time in seconds. Required on the first call. | |
| t_ref | No | For subject="manual" with subject_box — source-seconds timestamp the box applies to. Informational in v1. | |
| title | No | Optional title for the assembled clip. Surfaces in get_job_status and download filenames; doesn't affect the cut itself. | |
| job_id | No | Job ID returned from a previous extract_vertical_clip call. Include along with payment_credential to confirm payment and trigger processing. Also include alone to recover the current state. | |
| profile | No | Optional encoding profile. Default: tiktok-primary (1080×1920 H.264 fast preset, CRF 22, 6 Mbps cap). tiktok-primary-720p: 720×1280, CBR 3 Mbps — half-resolution mobile-optimized, ~40% faster wall time. instagram-reels: 1080×1920 H.264 slow preset, CBR 4 Mbps. instagram-stories: same encode shape as instagram-reels. All four apply loudness normalization to -14 LUFS / -1.5 dBTP. | |
| subject | No | Optional reframing strategy. Default: "center" (hardcoded center crop, today's behavior). "auto": lock onto the longest-tracked face from the parent find_clips job's subjects-sidecar (or run inline detection if no sidecar). "subject_id": lock onto a specific face named in the sidecar (pass subject_id). "follow": switch crop between active speakers across the clip using the sidecar's active_speaker_timeline (per-segment encode + concat). "manual": caller specifies the subject (pass subject_box or subject_x_offset). See /.well-known/weftly-subjects-v1.schema.json. auto/subject_id/follow fall back to center if detection fails — the paid job always delivers a clip. | |
| subject_id | No | Required when subject="subject_id". Subject id from the parent's subjects-sidecar (e.g. "face_0"). | |
| subject_box | No | For subject="manual" — bounding box of the subject in source pixels. Crop centers on the box center. | |
| source_job_id | No | Job ID of any prior video job (find_clips, summarize, or video transcribe). Possessing this id is the capability — extract_vertical_clip is not session-bound, so a user can come back from a different session within the parent's TTL and still extract. Required on the first call. | |
| subject_x_offset | No | For subject="manual" — direct crop x-offset in source pixels (alternative to subject_box). | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ..."). extract_vertical_clip accepts Tempo USDC and Stripe SPT — see the challenge's WWW-Authenticate header or /.well-known/mpp.json for the supported methods. Include with job_id after paying the challenge to start processing. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description fully discloses behavioral traits: default output format (1080×1920 H.264/AAC), audio normalization, duration constraints (1-90 seconds), source requirements (horizontal, TTL), fallback to center crop, auto-refund on failure, and the two-call payment process.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and front-loaded with the core purpose, but it is quite lengthy. While every sentence adds value, brevity could be improved slightly for faster scanning.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (12 parameters, no output schema, no annotations), the description is remarkably complete. It covers output format, constraints, workflow, payment, error handling (auto-refund), fallbacks, and all parameter behaviors without requiring additional context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 100% schema coverage, the description adds significant meaning beyond the schema: it explains the two-call flow for source_job_id and job_id, details each encoding profile's resolution and bitrate, describes each subject strategy with fallbacks, and clarifies the manual framing options (subject_box, subject_x_offset).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Cut a 9:16 vertical clip from any prior video job' and specifies it's for social media platforms. It differentiates from siblings like find_clips and extract_clip by detailing the workflow and use case.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides extensive usage guidance: when to use (after a prior job), prerequisites (horizontal source, not expired), two-call flow, payment instructions, fallback behavior, and alternatives like pairing with find_clips. It explicitly states what is not allowed (vertical sources) and the independence of multiple calls.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
find_clipsAInspect
START HERE for any clip workflow on a video — find_clips is the canonical entry point and includes a full transcription as a free byproduct. Do not call transcribe first: doing so doubles the upload, doubles the spend, and produces the same transcript. Identify ranked candidate clips in a video — what to cut for highlights, social, or testimonials. Three-call flow: (1) call with filename (and optional query) to receive {job_id, payment_challenge}; (2) pay via MPP, then call with job_id + payment_credential to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns three outputs: role clip-candidates (JSON matching /.well-known/weftly-clips-v1.schema.json — includes source_job_id and source_expires_at), role transcript (SRT, free byproduct), role transcript-words (JSON matching /.well-known/weftly-transcript-v2.schema.json, free byproduct). Each candidate carries transcript_text — the full text of what's in the clip — so callers can preview content before paying for extract_clip. Optional query parameter switches to query mode (e.g., "they discuss pricing", "the part about hiring") with the same output shape; the mode field in clip-candidates.json indicates which mode produced the result. Flat price: $2.00 video — see /.well-known/mpp.json. Source-reuse contract: the source video stays in storage for 72h after find_clips completes. Hand the find_clips job_id (also returned as source_job_id in the candidates JSON) to extract_clip or extract_vertical_clip as their source_job_id — within those 72h they cut directly from the stored source: no re-upload, no re-transcribe, just $0.50 per cut. Pass the same source_job_id to as many extract calls as you need. Use for interviews, podcasts, sales calls, all-hands recordings. Retrying with job_id alone returns current state. Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| query | No | Optional. Switches the analyzer from "best clips" discovery mode to query mode — finds segments matching this content (e.g., "they discuss pricing", "the part about hiring"). Same output shape either way; the `mode` field in clip-candidates.json tells consumers how to interpret per-candidate scoring. | |
| job_id | No | Job ID returned from a previous call. Include along with payment_credential to confirm payment and receive the presigned upload URL. Also include alone to recover the current challenge/state if the original response was lost. | |
| filename | No | Filename with extension (e.g. "podcast.mp3"). Required on the first call — used to infer media type (audio vs video) and label outputs. Supported extensions: mp3, wav, m4a, ogg, flac, mp4, mov, webm, mkv. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include with job_id to verify payment and receive the upload URL. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description fully discloses behavior: three-call flow with payment, upload, and polling; outputs via get_job_status with three roles; free transcript as byproduct; source-reuse contract (72h storage); auto-refund on failure; retry semantics. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is lengthy but well-structured with clear sections and front-loaded purpose. Every sentence adds value. Could be slightly more concise, but the complexity of the workflow justifies the length.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite no output schema and no annotations, the description fully compensates by detailing the output structure (roles and references to well-known schemas), payment process, and all steps. For a tool with 4 parameters and complex workflow, this is complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed descriptions for all 4 parameters. The description adds significant context beyond schema by explaining how parameters fit into the multi-step workflow (e.g., filename on first call, job_id for state recovery, payment_credential with job_id). This goes beyond baseline 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with 'START HERE for any clip workflow on a video' and clearly states 'Identify ranked candidate clips in a video'. It uses a specific verb-resource combination and distinguishes from siblings like transcribe by explicitly warning not to call transcribe first.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides a complete three-call flow, specifies when to use (entry point for clip workflows) and when not (do not call transcribe first). It mentions alternatives like extract_clip and extract_vertical_clip, explaining the source-reuse contract and 72h window. Every usage aspect is covered.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_job_statusAInspect
Check the status of a transcribe or summarize job. Returns the current state and, when completed, an outputs array. Each output has either content (returned inline) or a presigned, time-limited (1 hour) download_url. Small text outputs (e.g. transcript SRT, clip-candidates, summary) come inline as content; larger outputs — transcript-words JSON for any non-trivial recording, plus video outputs like clip-video / clip-vertical-video — come as a download_url to fetch when needed. Optionally pass format (srt, txt, vtt, json, words) to get the transcript content inline in the top-level transcript field — txt and vtt are derived from the stored SRT; json is v1 (segments only); words is v2 (segments + per-word timestamps matching /.well-known/weftly-transcript-v2.schema.json). Poll this periodically after calling complete_upload — wait at least 60 seconds between checks. For files under 10 minutes, jobs usually complete within 1-2 minutes. For long files (1hr+), expect 10-30 minutes.
Also use this to recover from lost state: if the original challenge was lost, call get_job_status(job_id) to retrieve a fresh challenge (status "awaiting_payment") or the upload URL (status "awaiting_upload").
| Name | Required | Description | Default |
|---|---|---|---|
| format | No | When the job is completed, return the transcript inline in this format instead of only a download URL. Options: "srt" (SubRip with timestamps), "txt" (plain text — no timestamps), "vtt" (WebVTT), "json" (v1, segments only), "words" (v2, segments + per-word timestamps matching /.well-known/weftly-transcript-v2.schema.json). Omit for download URLs only. | |
| job_id | Yes | The job_id returned from a previous transcribe or summarize call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses output structure (outputs array with content/download_url), time-limited URLs (1 hour), which outputs are inline vs download, format parameter effects, and possible job states. No annotations were provided, so description carries full burden and meets it.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Long but every sentence adds value. Front-loaded with main purpose, then logically flows through output behavior, format, polling advice, and recovery. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations and no output schema, the description covers input, output behavior, timing, error recovery, and parameter effects comprehensively. An agent can use this tool correctly without external documentation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with parameter descriptions. The tool description adds meaning by explaining when to use the format parameter (to get transcript inline) and detailing what each format returns. This goes beyond the schema's enum listing.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool checks the status of transcribe/summarize jobs and explains its behavior for completed jobs (inline content vs download_url). It also mentions recovery from lost state, distinguishing it from sibling tools like complete_upload.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says to poll after complete_upload, wait at least 60 seconds, and gives timing estimates. Also describes recovery use case. Provides clear when-to-use and what to expect.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_youtube_publish_statusAInspect
Check the status of a YouTube publish job. Poll periodically after trigger_youtube_publish — the upload takes 1-10 minutes depending on video size. Returns status (pending, publishing, completed, failed) and the youtube_video_url once complete.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | Job ID returned by publish_to_youtube | |
| session_token | Yes | Session token |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided; description discloses return values (status, URL) and polling suggestion. Does not cover error handling or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences, front-loaded with purpose. Efficient for the complexity level.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers main behavior and return format despite no output schema. Lacks details on error states or idempotency, but adequate for a simple poller.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for both parameters. Description does not add additional parameter meaning beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it checks status of a YouTube publish job, using verb-resource structure. It distinguishes from siblings like trigger_youtube_publish by specifying polling usage.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says to poll after trigger_youtube_publish and gives typical duration (1-10 min). Lacks explicit when-not-to-use but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
mpp_smoke_testAInspect
Smoke-test the MPP payment plumbing end-to-end via this MCP server, for $0.01 USDC. Two-call flow: (1) call with no arguments to receive an MPP payment_challenge; (2) pay via MPP and call again with payment_credential set to the resulting Authorization header value (e.g. "Payment eyJ...") to receive {paid: true, timestamp, receipt_ref, payment_method}. Uses the exact same createPayToAddress + createMppHandler verification path as paid product tools (transcribe, summarize), so a green run here means real paid calls will work too. Stateless — no job is created, no database row written. Use this whenever you want to confirm a wallet, the MCP transport, the worker, and the production payment middleware are all healthy without paying a transcribe price. Cost: $0.01 USDC per attempt.
| Name | Required | Description | Default |
|---|---|---|---|
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include to verify payment and receive {paid: true}. Omit on the first call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It discloses statelessness, no job created, no database row written, and that it uses the exact same verification path as paid tools. Also mentions cost and two-call requirement.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single dense paragraph but well-organized: purpose, flow, example, stateless note, usage guidance, cost. Every sentence adds value; slight lack of bulleted structure prevents a 5.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the two-call payment flow complexity, description thoroughly explains the entire process, return values, and relationship to other tools. No output schema but description covers return of second call.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with one parameter. Description adds meaning beyond schema by explaining the parameter is the Authorization header value from paying the challenge, and should be omitted on first call. Very helpful.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description specifies the tool is for smoke-testing MPP payment plumbing via a two-call flow costing $0.01 USDC. It clearly distinguishes this tool from paid siblings like transcribe and summarize by noting it's a cheaper health check.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'whenever you want to confirm a wallet, the MCP transport, the worker, and the production payment middleware are all healthy without paying a transcribe price.' Provides the two-call flow steps. Does not mention when not to use or alternatives, but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
publish_to_youtubeAInspect
Publish an existing video from a transcribe or summarize job to YouTube. Creates a paid publish job (flat $1.75 price) and stores the OAuth token. Captions are auto-generated from the session transcript if available. Workflow: create_job → pay → trigger_youtube_publish → poll get_youtube_publish_status. Requires a YouTube OAuth2 access token obtained independently via Google OAuth (scope: youtube.upload).
| Name | Required | Description | Default |
|---|---|---|---|
| title | Yes | YouTube video title (max 100 characters) | |
| visibility | Yes | YouTube video visibility: "private" (default), "unlisted", or "public" | private |
| description | No | YouTube video description (max 5000 characters) | |
| access_token | Yes | YouTube OAuth2 access token — the caller is responsible for obtaining this via Google OAuth | |
| refresh_token | No | YouTube OAuth2 refresh token — if provided, the Workflow will refresh the access token automatically before uploading | |
| session_token | Yes | Session token from create_session, create_transcript, or create_summary | |
| source_job_id | Yes | Job ID of an existing transcribe or summarize job in this session whose video to publish. The Workflow will auto-generate captions if no transcript is found. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. It discloses key behaviors: creates a paid job ($1.75), stores OAuth token, auto-generates captions from session transcript, and requires YouTube OAuth2 scope. Additional details like error handling or rate limits are absent, but overall transparency is good.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise (four sentences) and front-loaded with the purpose. Every sentence adds essential information—purpose, pricing, workflow, auth requirements—without fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 7 parameters, 5 required, no output schema, and no annotations, the description covers purpose, workflow, auth, and parameter semantics. It could mention return values or error scenarios, but within this complexity it is fairly complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, baseline 3. The description adds meaningful context: explains that source_job_id must be from an existing transcribe/summarize job, access_token is obtained independently, and refresh_token enables auto-refresh. This adds value beyond the schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool publishes an existing video from a transcribe or summarize job to YouTube. It specifies the verb 'publish' and resource, and distinguishes from sibling tools like extract_clip and trigger_youtube_publish.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit workflow steps (create_job → pay → trigger_youtube_publish → poll) and prerequisites (OAuth token, source job ID). It lacks explicit when-not-to-use guidance but offers clear context on the intended usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
summarizeAInspect
Summarize an audio or video file — returns both a text summary AND the full transcript (with per-word timestamps). Do not also call transcribe on the same file. Three-call flow: (1) call with filename to receive {job_id, payment_challenge}; (2) pay via MPP, then call with job_id + payment_credential to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns three outputs: role summary (plain text), role transcript (SRT), and role transcript-words (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass format=srt|txt|vtt|json|words to get_job_status to receive transcript content inline — txt and vtt are derived from SRT, json is v1 (segments only), words is v2 (segments + words). Flat price: audio $0.75, video $1.25 — see /.well-known/mpp.json for the authoritative table. Use for meetings, long-form interviews, lectures, and podcast episodes; the words output additionally supports creating clips, multicamera edits, or edit-video-from-transcript. Retrying any call with job_id alone returns current state (idempotent). Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | No | Job ID returned from a previous call. Include along with payment_credential to confirm payment and receive the presigned upload URL. Also include alone to recover the current challenge/state if the original response was lost. | |
| filename | No | Filename with extension (e.g. "podcast.mp3"). Required on the first call — used to infer media type (audio vs video) and label outputs. Supported extensions: mp3, wav, m4a, ogg, flac, mp4, mov, webm, mkv. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include with job_id to verify payment and receive the upload URL. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so the description carries full burden. It thoroughly discloses the multi-step workflow, payment requirements, upload URL expiry (1h), idempotency on retry, and auto-refund on failure. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is longer but well-structured with clear steps and bullet-like flow. Every sentence adds necessary information; however, it could be slightly more concise without losing clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given complexity (multi-step, payment, multiple output formats), the description covers nearly everything: outputs, format options, pricing, external schema reference. Minor gaps like error handling details are missing, but overall very complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, baseline 3. The description adds value by explaining the role of each parameter in the workflow (e.g., filename for first call, job_id for subsequent, payment_credential format). It also clarifies that job_id alone recovers state.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool summarizes an audio/video file and returns both a text summary and full transcript with per-word timestamps. It distinguishes from the sibling 'transcribe' tool by explicitly warning not to call both on the same file.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance: it explains the three-call flow, warns against also using transcribe, and lists use cases. It lacks explicit 'when not to use' scenarios beyond the transcribe warning, but still offers solid usage context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
transcribeAInspect
Transcribe audio or video to text, including per-word timestamps for precise editing. Three-call flow: (1) call with filename to receive {job_id, payment_challenge}; (2) pay via MPP, then call with job_id + payment_credential to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns two outputs: role transcript (SRT) and role transcript-words (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass format=srt|txt|vtt|json|words to get_job_status to receive content inline — txt and vtt are derived from SRT, json is v1 (segments only), words is v2 (segments + words). Flat price: audio $0.50, video $1.00 — see /.well-known/mpp.json for the authoritative table. Use for podcasts, interviews, meetings, lectures, and especially for creating clips, multicamera edits, or edit-video-from-transcript where word boundaries matter. Retrying any call with job_id alone returns current state (idempotent). Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | No | Job ID returned from a previous call. Include along with payment_credential to confirm payment and receive the presigned upload URL. Also include alone to recover the current challenge/state if the original response was lost. | |
| filename | No | Filename with extension (e.g. "podcast.mp3"). Required on the first call — used to infer media type (audio vs video) and label outputs. Supported extensions: mp3, wav, m4a, ogg, flac, mp4, mov, webm, mkv. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include with job_id to verify payment and receive the upload URL. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Given no annotations, the description fully discloses the multi-step process, payment requirement, presigned URL expiry, idempotent retries, and auto-refund on failure. It also explains output formats and how to retrieve them.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is dense but well-structured with a clear flow. Every sentence adds value, though it could be slightly more concise. Front-loads the core purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Without an output schema, the description thoroughly explains the entire workflow, including three calls, pricing, retry behavior, and reference to external schemas for output format details. It anticipates agent confusion and covers edge cases.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
All three parameters are described in the schema (100% coverage), and the description adds critical context: file extension support, payment credential format, job_id usage for state recovery, and the overall workflow.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it transcribes audio/video to text with per-word timestamps, and specifies use cases like podcasts, interviews, etc. It distinguishes from siblings by focusing on transcription vs. clipping/publishing.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides a detailed three-call flow and explains when to use each parameter, including retry idempotency. However, it doesn't explicitly state when not to use this tool or contrast with alternative tools like 'extract_clip'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
trigger_youtube_publishAInspect
Start the YouTube upload after payment is confirmed. Call this after publish_to_youtube once payment_status is "paid". Returns immediately — the upload runs as a durable Workflow in the background. Poll get_youtube_publish_status to track progress.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | Job ID returned by publish_to_youtube | |
| session_token | Yes | Session token |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It discloses that the tool returns immediately and runs as a durable Workflow in the background, which is key behavioral context. Does not mention auth requirements, but flow implies it.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences: first states purpose and condition, second gives usage order, third describes behavior. Highly efficient and front-loaded with critical information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 2 parameters and no output schema, the description fully covers its role in the workflow, explains async behavior, and references sibling tools. Nothing essential is missing.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so description adds value by stating that 'job_id' is the one returned by 'publish_to_youtube', linking the parameter to a prior step. This external context aids the agent.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the action 'Start the YouTube upload' and specifies the precondition (payment confirmed). It distinguishes itself from siblings by referencing the preceding tool 'publish_to_youtube' and the subsequent 'get_youtube_publish_status'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: after 'publish_to_youtube' and 'payment_status is "paid"'. Also instructs to poll 'get_youtube_publish_status'. Could include a when-not, but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!