Weftly
Server Details
Transcribe, summarize, find and cut clips, publish to YouTube. Per-job pricing, no account.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.6/5 across 11 of 11 tools scored. Lowest: 3.9/5.
Most tools have distinct purposes, though transcribe, summarize, and find_clips overlap in transcription. However, descriptions clearly differentiate their use cases (find_clips as entry for clips, summarize adding summary, transcribe for pure transcription), reducing confusion.
All tool names use snake_case with a consistent verb_noun pattern (e.g., complete_upload, extract_vertical_clip, get_job_status). The only minor deviation is mpp_smoke_test, which is a noun_noun_verb form, but it's still readable and fits the general style.
11 tools is well-scoped for a video/audio processing and YouTube publishing server. Each tool has a clear role in the workflow: upload, transcribe/summarize, clip extraction, job status, and publishing. No unnecessary tools and no missing essential functions for the core purpose.
The tool set covers the complete user workflow from upload to processing, clipping, and YouTube publishing. Gaps like job listing or cancellation are absent, but the auto-refund on failure and status polling compensate. Overall, the surface is comprehensive for the stated domain.
Available Tools
11 toolscomplete_uploadAInspect
Confirm that the file has been uploaded (via HTTP PUT to the upload_url from transcribe or summarize) and start processing. Verifies that the file is present in storage and that the job has been paid. Returns status "processing". Poll get_job_status to track progress and retrieve download URLs when done.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | The job_id returned from a previous transcribe or summarize call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it triggers processing after upload confirmation, returns errors for missing files or incomplete payment, and requires polling 'check_job_status' for progress tracking. This covers operational flow, error conditions, and async behavior, though it doesn't detail auth needs or rate limits explicitly.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is highly concise and well-structured: three sentences that efficiently cover purpose, prerequisites, error conditions, and next steps. Each sentence adds critical information without redundancy, making it easy to parse and front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (post-upload processing with async behavior), no annotations, and no output schema, the description does a good job of completeness. It explains the workflow, prerequisites, error cases, and follow-up actions. However, it lacks details on return values or output format, which would be helpful since there's no output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters ('session_token' and 'job_id') documented in the schema. The description doesn't add any parameter-specific semantics beyond what the schema provides (e.g., it doesn't explain format or constraints for 'session_token' or 'job_id'). Baseline 3 is appropriate since the schema handles parameter documentation adequately.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Confirm the upload is done and start processing.' It specifies the verb ('confirm' and 'start processing') and resource ('upload'), but doesn't explicitly distinguish it from sibling tools like 'upload_file' or 'get_upload_url' beyond mentioning they are prerequisites. The purpose is clear but sibling differentiation is only implied through usage context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidelines: 'Call this after uploading via either upload_file or the presigned URL from get_upload_url.' It names specific alternatives ('upload_file' and 'get_upload_url') as prerequisites and directs to 'poll check_job_status to track progress' for post-invocation. This clearly indicates when to use this tool versus others in the workflow.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
extract_clipAInspect
Cut and assemble a clip from any prior video job (find_clips, summarize, or video transcribe). Operates on a parent job — possessing the parent source_job_id is the capability, no upload step. Pass one segment for a simple cut, or multiple non-contiguous segments to compose a single mp4 highlight reel — same flat $0.50 either way. Two-call flow: (1) call with source_job_id + segments (ordered array of {start, end, label?} in source seconds, total duration capped at 30 minutes) to receive {job_id, payment_challenge}; (2) pay via MPP and call with job_id + payment_credential to start processing. No upload step. Poll get_job_status(job_id) for completion; outputs are role clip-video (the assembled .mp4, frame-accurate boundaries with 15ms audio fades at segment joins) and — when include_transcript: true (default) — roles clip-srt + clip-words (transcripts stitched and time-shifted to match the assembled video). Set include_transcript: false to skip transcript outputs. Payment: MPP — accepts Tempo USDC and Stripe SPT. The challenge's WWW-Authenticate header and /.well-known/mpp.json are authoritative for which methods are offered. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check expires_at from get_job_status on the parent). Multiple extract_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Optional title for the assembled clip. Surfaces in get_job_status and download filenames; doesn't affect the cut itself. | |
| job_id | No | Job ID returned from a previous extract_clip call. Include along with payment_credential to confirm payment and trigger processing. Also include alone to recover the current state. | |
| segments | No | Ordered array of source-relative segments to cut and concatenate into the output. Single segment for a simple cut; multiple segments compose a single mp4 from non-contiguous moments — same flat $0.50 either way. Total summed duration capped at 30 minutes per call. Required on the first call. | |
| source_job_id | No | Job ID of any prior video job (find_clips, summarize, or video transcribe). Possessing this id is the capability — extract_clip is not session-bound, so a user can come back from a different session within the parent's TTL and still extract. Required on the first call. | |
| include_transcript | No | Default true. When true, the pipeline writes clip-srt + clip-words outputs stitched to match the assembled video. Set false to skip and just receive the .mp4. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ..."). extract_clip accepts Tempo USDC and Stripe SPT — see the challenge's WWW-Authenticate header or /.well-known/mpp.json for the supported methods. Include with job_id after paying the challenge to start processing. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses two-call workflow, payment requirement, source TTL (72h/24h), duration cap (30 min), flat fee ($0.50), output roles (clip-video, clip-srt, clip-words), transcript option, and auto-refund on failure. No annotations provided, so description fully addresses behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is detailed and well-structured, front-loading the main purpose and then logically expanding on workflow, payment, and constraints. Slightly verbose but all sentences add necessary context for a complex tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema exists, but description fully explains return values (job_id, payment_challenge, output roles) and error handling (auto-refund). Covers source TTL, segment limits, and transcript behavior, making it complete for agent usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but description adds essential workflow context (e.g., two-call flow, payment credential usage, segment ordering, label purpose) beyond the schema's basic field descriptions. Each parameter's role in the flow is explained.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool cuts and assembles clips from prior video jobs ('Cut and assemble a clip from any prior video job'). It specifies the resource (parent job) and action, and distinguishes from sibling 'extract_vertical_clip' by implying orientation difference.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly details when to use (for extraction from prior jobs), the two-call flow, payment process, and polling for completion. Provides guidance on alternatives (multiple calls independent, TTL limits) and contrasts with upload steps absent.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
extract_vertical_clipAInspect
Cut a 9:16 vertical clip from any prior video job (find_clips, summarize, or video transcribe), suitable for direct upload to TikTok, Instagram Reels, or YouTube Shorts. Default output is 1080×1920 H.264 / AAC .mp4 with center-cropped framing; audio loudness-normalized to -14 LUFS / -1.5 dBTP for short-form social. Single-segment only; clip duration must be between 1 and 90 seconds (Instagram Reels max). Operates on a parent job — possessing the parent source_job_id is the capability, no upload step. Two-call flow: (1) call with source_job_id + start + end (in source seconds) to receive {job_id, payment_challenge}; (2) pay via MPP and call with job_id + payment_credential to start processing. Poll get_job_status(job_id) for completion; output is role clip-vertical-video (the .mp4). Flat price: $0.50 per clip. Payment: MPP — accepts Tempo USDC and Stripe SPT. Optional profile parameter selects the encoding profile (default tiktok-primary). Allowed values: tiktok-primary (1080×1920, fast preset, CRF 22), tiktok-primary-720p (720×1280, CBR 3 Mbps — half-resolution mobile-optimized, ~40% faster wall time), instagram-reels (1080×1920, slow preset, CBR 4 Mbps), instagram-stories (same encode shape as instagram-reels). All four profiles loudness-normalize identically. Optional subject parameter controls reframing (default center, preserves today's behavior): auto locks onto the longest-tracked face from the parent's subjects-sidecar (or runs inline detection if the parent has none); subject_id (with subject_id param naming a face_N from the sidecar) locks onto a specific subject; follow switches crop between active speakers across the clip using the sidecar's active_speaker_timeline; manual accepts caller-supplied framing via subject_box: {x, y, w, h} (source pixels) or subject_x_offset (direct crop x). Sidecar shape at /.well-known/weftly-subjects-v1.schema.json. auto/subject_id/follow fall back to center if detection or sidecar resolution fails — the paid job always delivers a clip. Source must be a horizontal video (wider than 9:16) — already-vertical or square sources are rejected. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check expires_at from get_job_status on the parent). Pair with find_clips ($2.00/video) to pick a moment first, then call this to get a download-ready vertical mp4 in under 5 minutes. Multiple extract_vertical_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| end | No | Source-relative end time in seconds (must be > start, and end - start ∈ [1, 90]). Required on the first call. | |
| start | No | Source-relative start time in seconds. Required on the first call. | |
| t_ref | No | For subject="manual" with subject_box — source-seconds timestamp the box applies to. Informational in v1. | |
| title | No | Optional title for the assembled clip. Surfaces in get_job_status and download filenames; doesn't affect the cut itself. | |
| job_id | No | Job ID returned from a previous extract_vertical_clip call. Include along with payment_credential to confirm payment and trigger processing. Also include alone to recover the current state. | |
| profile | No | Optional encoding profile. Default: tiktok-primary (1080×1920 H.264 fast preset, CRF 22, 6 Mbps cap). tiktok-primary-720p: 720×1280, CBR 3 Mbps — half-resolution mobile-optimized, ~40% faster wall time. instagram-reels: 1080×1920 H.264 slow preset, CBR 4 Mbps. instagram-stories: same encode shape as instagram-reels. All four apply loudness normalization to -14 LUFS / -1.5 dBTP. | |
| subject | No | Optional reframing strategy. Default: "center" (hardcoded center crop, today's behavior). "auto": lock onto the longest-tracked face from the parent find_clips job's subjects-sidecar (or run inline detection if no sidecar). "subject_id": lock onto a specific face named in the sidecar (pass subject_id). "follow": switch crop between active speakers across the clip using the sidecar's active_speaker_timeline (per-segment encode + concat). "manual": caller specifies the subject (pass subject_box or subject_x_offset). See /.well-known/weftly-subjects-v1.schema.json. auto/subject_id/follow fall back to center if detection fails — the paid job always delivers a clip. | |
| subject_id | No | Required when subject="subject_id". Subject id from the parent's subjects-sidecar (e.g. "face_0"). | |
| subject_box | No | For subject="manual" — bounding box of the subject in source pixels. Crop centers on the box center. | |
| source_job_id | No | Job ID of any prior video job (find_clips, summarize, or video transcribe). Possessing this id is the capability — extract_vertical_clip is not session-bound, so a user can come back from a different session within the parent's TTL and still extract. Required on the first call. | |
| subject_x_offset | No | For subject="manual" — direct crop x-offset in source pixels (alternative to subject_box). | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ..."). extract_vertical_clip accepts Tempo USDC and Stripe SPT — see the challenge's WWW-Authenticate header or /.well-known/mpp.json for the supported methods. Include with job_id after paying the challenge to start processing. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description fully covers behavioral traits: default output (1080×1920 H.264/AAC), loudness normalization, source requirements (horizontal, TTL), the two-call flow, payment details, and auto-refund on failure. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is thorough and well-structured, starting with purpose, then specifics, flow, pricing, and optional parameters. It is slightly dense but all sentences are justified. Could be trimmed slightly, but overall efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite no output schema, the description provides complete contextual coverage: purpose, constraints, two-call flow, pricing, payment methods, source prerequisites, and error handling. An agent can fully understand and invoke the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds significant value by explaining the two-call flow for parameters (source_job_id+start+end vs. job_id+payment_credential), detailing each profile option, clarifying title usage, and noting that source_job_id is session-independent.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool cuts a 9:16 vertical clip from prior video jobs (find_clips, summarize, or video transcribe) for short-form social media platforms. It distinguishes itself from siblings like extract_clip and find_clips by specifying the output format and use case.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly explains when to use the tool (for social media vertical clips), when not to use it (source must be horizontal, single-segment only, duration 1-90 seconds), and provides alternatives (pair with find_clips). It also outlines the two-call flow and payment requirements.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
find_clipsAInspect
START HERE for any clip workflow on a video — find_clips is the canonical entry point and includes a full transcription as a free byproduct. Do not call transcribe first: doing so doubles the upload, doubles the spend, and produces the same transcript. Identify ranked candidate clips in a video — what to cut for highlights, social, or testimonials. Three-call flow: (1) call with filename (and optional query) to receive {job_id, payment_challenge}; (2) pay via MPP, then call with job_id + payment_credential to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns three outputs: role clip-candidates (JSON matching /.well-known/weftly-clips-v1.schema.json — includes source_job_id and source_expires_at), role transcript (SRT, free byproduct), role transcript-words (JSON matching /.well-known/weftly-transcript-v2.schema.json, free byproduct). Each candidate carries transcript_text — the full text of what's in the clip — so callers can preview content before paying for extract_clip. Optional query parameter switches to query mode (e.g., "they discuss pricing", "the part about hiring") with the same output shape; the mode field in clip-candidates.json indicates which mode produced the result. Flat price: $2.00 video — see /.well-known/mpp.json. Source-reuse contract: the source video stays in storage for 72h after find_clips completes. Hand the find_clips job_id (also returned as source_job_id in the candidates JSON) to extract_clip or extract_vertical_clip as their source_job_id — within those 72h they cut directly from the stored source: no re-upload, no re-transcribe, just $0.50 per cut. Pass the same source_job_id to as many extract calls as you need. Use for interviews, podcasts, sales calls, all-hands recordings. Retrying with job_id alone returns current state. Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| query | No | Optional. Switches the analyzer from "best clips" discovery mode to query mode — finds segments matching this content (e.g., "they discuss pricing", "the part about hiring"). Same output shape either way; the `mode` field in clip-candidates.json tells consumers how to interpret per-candidate scoring. | |
| job_id | No | Job ID returned from a previous call. Include along with payment_credential to confirm payment and receive the presigned upload URL. Also include alone to recover the current challenge/state if the original response was lost. | |
| filename | No | Filename with extension (e.g. "podcast.mp3"). Required on the first call — used to infer media type (audio vs video) and label outputs. Supported extensions: mp3, wav, m4a, ogg, flac, mp4, mov, webm, mkv. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include with job_id to verify payment and receive the upload URL. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description fully discloses the payment flow, upload process, file expiration (72h), output format references to schemas, and auto-refund on failure. It also explains the source-reuse contract for extract calls.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is very long and thorough, but not concise. It includes extensive details that could be structured more compactly. While the content is valuable, it sacrifices brevity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, the description thoroughly explains the output roles and schemas (clip-candidates, transcript, transcript-words). It covers the entire workflow including payment, upload, polling, and source-reuse, making it complete for an agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, baseline 3. The description adds significant context beyond schema: explains the three-call flow, how job_id is used, what filename does, and gives examples for query. It adds value beyond the schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states that find_clips is the canonical entry point for clip workflows, includes transcription as a free byproduct, and distinguishes it from siblings like transcribe and extract_clip. It provides a specific verb and resource with clear scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidelines: start here for clip workflows, do not call transcribe first, and details the three-call flow. It also warns against doubling uploads and spend, and gives retry behavior.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_job_statusAInspect
Check the status of a transcribe or summarize job. Returns the current state and, when completed, an outputs array. Each output has either content (returned inline) or a presigned, time-limited (1 hour) download_url. Small text outputs (e.g. transcript SRT, clip-candidates, summary) come inline as content; larger outputs — transcript-words JSON for any non-trivial recording, plus video outputs like clip-video / clip-vertical-video — come as a download_url to fetch when needed. Optionally pass format (srt, txt, vtt, json, words) to get the transcript content inline in the top-level transcript field — txt and vtt are derived from the stored SRT; json is v1 (segments only); words is v2 (segments + per-word timestamps matching /.well-known/weftly-transcript-v2.schema.json). Poll this periodically after calling complete_upload — wait at least 60 seconds between checks. For files under 10 minutes, jobs usually complete within 1-2 minutes. For long files (1hr+), expect 10-30 minutes.
Also use this to recover from lost state: if the original challenge was lost, call get_job_status(job_id) to retrieve a fresh challenge (status "awaiting_payment") or the upload URL (status "awaiting_upload").
| Name | Required | Description | Default |
|---|---|---|---|
| format | No | When the job is completed, return the transcript inline in this format instead of only a download URL. Options: "srt" (SubRip with timestamps), "txt" (plain text — no timestamps), "vtt" (WebVTT), "json" (v1, segments only), "words" (v2, segments + per-word timestamps matching /.well-known/weftly-transcript-v2.schema.json). Omit for download URLs only. | |
| job_id | Yes | The job_id returned from a previous transcribe or summarize call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden and delivers substantial behavioral context. It discloses: polling behavior with timing constraints (60 seconds between checks), typical completion times (1-2 minutes for short files, 10-30 minutes for long files), presigned URL characteristics (time-limited to 1 hour), and recovery functionality for lost state. It doesn't mention error handling or rate limits, keeping it from a perfect score.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized and front-loaded with core functionality. Every sentence adds value: first defines purpose, second explains format parameter utility, third gives polling guidance, fourth provides timing estimates, fifth describes URL characteristics, sixth introduces recovery use case. Minor redundancy in explaining format options slightly reduces efficiency.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (status checking with polling, recovery functionality, inline format options) and no annotations or output schema, the description provides comprehensive context. It covers purpose, usage patterns, behavioral expectations, parameter semantics, timing guidance, and edge cases (recovery from lost state), making it complete enough for effective agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the baseline is 3. The description adds meaningful context beyond the schema: it explains that the 'format' parameter is 'useful when you need the text directly without fetching a URL' and clarifies that omitting it returns 'download URLs only'. This provides practical usage insight that enhances the schema's technical specifications.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Check the status of a transcribe or summarize job' with specific verbs ('check', 'retrieve') and resources ('job', 'download URLs', 'fresh challenge', 'upload URL'). It distinguishes from siblings like 'complete_upload', 'summarize', and 'transcribe' by focusing on status monitoring rather than job initiation or completion.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidance: 'Poll this periodically after calling complete_upload — wait at least 60 seconds between checks' and 'Also use this to recover from lost state'. It distinguishes when to use this tool versus alternatives by specifying it's for checking job status after job creation tools, and it includes timing recommendations for different file sizes.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_youtube_publish_statusAInspect
Check the status of a YouTube publish job. Poll periodically after trigger_youtube_publish — the upload takes 1-10 minutes depending on video size. Returns status (pending, publishing, completed, failed) and the youtube_video_url once complete.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | Job ID returned by publish_to_youtube | |
| session_token | Yes | Session token |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided; description discloses return values (status, URL) and polling suggestion. Does not cover error handling or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences, front-loaded with purpose. Efficient for the complexity level.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers main behavior and return format despite no output schema. Lacks details on error states or idempotency, but adequate for a simple poller.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for both parameters. Description does not add additional parameter meaning beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it checks status of a YouTube publish job, using verb-resource structure. It distinguishes from siblings like trigger_youtube_publish by specifying polling usage.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says to poll after trigger_youtube_publish and gives typical duration (1-10 min). Lacks explicit when-not-to-use but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
mpp_smoke_testAInspect
Smoke-test the MPP payment plumbing end-to-end via this MCP server, for $0.01 USDC. Two-call flow: (1) call with no arguments to receive an MPP payment_challenge; (2) pay via MPP and call again with payment_credential set to the resulting Authorization header value (e.g. "Payment eyJ...") to receive {paid: true, timestamp, receipt_ref, payment_method}. Uses the exact same createPayToAddress + createMppHandler verification path as paid product tools (transcribe, summarize), so a green run here means real paid calls will work too. Stateless — no job is created, no database row written. Use this whenever you want to confirm a wallet, the MCP transport, the worker, and the production payment middleware are all healthy without paying a transcribe price. Cost: $0.01 USDC per attempt.
| Name | Required | Description | Default |
|---|---|---|---|
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include to verify payment and receive {paid: true}. Omit on the first call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It discloses statelessness, no job created, no database row written, and that it uses the exact same verification path as paid tools. Also mentions cost and two-call requirement.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single dense paragraph but well-organized: purpose, flow, example, stateless note, usage guidance, cost. Every sentence adds value; slight lack of bulleted structure prevents a 5.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the two-call payment flow complexity, description thoroughly explains the entire process, return values, and relationship to other tools. No output schema but description covers return of second call.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with one parameter. Description adds meaning beyond schema by explaining the parameter is the Authorization header value from paying the challenge, and should be omitted on first call. Very helpful.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description specifies the tool is for smoke-testing MPP payment plumbing via a two-call flow costing $0.01 USDC. It clearly distinguishes this tool from paid siblings like transcribe and summarize by noting it's a cheaper health check.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'whenever you want to confirm a wallet, the MCP transport, the worker, and the production payment middleware are all healthy without paying a transcribe price.' Provides the two-call flow steps. Does not mention when not to use or alternatives, but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
publish_to_youtubeAInspect
Publish an existing video from a transcribe or summarize job to YouTube. Creates a paid publish job (flat $2.00 price) and stores the OAuth token. Captions are auto-generated from the session transcript if available. Workflow: create_job → pay → trigger_youtube_publish → poll get_youtube_publish_status. Requires a YouTube OAuth2 access token obtained independently via Google OAuth (scope: youtube.upload).
| Name | Required | Description | Default |
|---|---|---|---|
| title | Yes | YouTube video title (max 100 characters) | |
| visibility | Yes | YouTube video visibility: "private" (default), "unlisted", or "public" | private |
| description | No | YouTube video description (max 5000 characters) | |
| access_token | Yes | YouTube OAuth2 access token — the caller is responsible for obtaining this via Google OAuth | |
| refresh_token | No | YouTube OAuth2 refresh token — if provided, the Workflow will refresh the access token automatically before uploading | |
| session_token | Yes | Session token from create_session, create_transcript, or create_summary | |
| source_job_id | Yes | Job ID of an existing transcribe or summarize job in this session whose video to publish. The Workflow will auto-generate captions if no transcript is found. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, description discloses paid job creation ($2.00), OAuth token storage, auto-captions, and required workflow. Lacks error conditions or token expiry details but is fairly transparent for a workflow initiator.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences, front-loaded with main purpose, compact and informative with no superfluous language.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given siblings and no output schema, description explains workflow, prerequisites, and next steps. Could mention error handling but suffices for tool usage context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, baseline 3. Description adds value by explaining source_job_id context (transcribe/summarize) and auto-captions, enhancing understanding beyond schema docs.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the verb 'Publish' and the resource 'an existing video from a transcribe or summarize job to YouTube'. It distinguishes from siblings like trigger_youtube_publish and complete_upload by specifying the job source and workflow step.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit workflow steps and prerequisites (OAuth token). However, does not directly compare with sibling tools like trigger_youtube_publish or get_youtube_publish_status to clarify when to use each.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
summarizeAInspect
Summarize an audio or video file — returns both a text summary AND the full transcript (with per-word timestamps). Do not also call transcribe on the same file. Three-call flow: (1) call with filename to receive {job_id, payment_challenge}; (2) pay via MPP, then call with job_id + payment_credential to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns three outputs: role summary (plain text), role transcript (SRT), and role transcript-words (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass format=srt|txt|vtt|json|words to get_job_status to receive transcript content inline — txt and vtt are derived from SRT, json is v1 (segments only), words is v2 (segments + words). Flat price: audio $0.75, video $1.25 — see /.well-known/mpp.json for the authoritative table. Use for meetings, long-form interviews, lectures, and podcast episodes; the words output additionally supports creating clips, multicamera edits, or edit-video-from-transcript. Retrying any call with job_id alone returns current state (idempotent). Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | No | Job ID returned from a previous call. Include along with payment_credential to confirm payment and receive the presigned upload URL. Also include alone to recover the current challenge/state if the original response was lost. | |
| filename | No | Filename with extension (e.g. "podcast.mp3"). Required on the first call — used to infer media type (audio vs video) and label outputs. Supported extensions: mp3, wav, m4a, ogg, flac, mp4, mov, webm, mkv. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include with job_id to verify payment and receive the upload URL. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description fully discloses behavioral traits: it explains the multi-step workflow (create job, pay, upload, poll), pricing details ($0.75 for audio, $1.25 for video), output formats (SRT, VTT, TXT, JSON), and constraints like segment-level transcripts and flat pricing per job, covering all critical operational aspects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and front-loaded with key information (purpose and workflow), but includes some redundant details (e.g., repeating the three-call flow and pricing in multiple sentences) that slightly reduce efficiency, though most content is valuable.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (multi-step workflow, payment, sibling interactions) and lack of annotations or output schema, the description is highly complete: it covers the process, outputs, pricing, use cases, and integration with other tools, providing all necessary context for effective agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal extra context (e.g., linking 'filename' to media type inference and 'payment_credential' to the MPP flow), but doesn't significantly enhance parameter understanding beyond what the schema provides, meeting the baseline expectation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('start a summarization job') and resources ('audio or video file'), and explicitly distinguishes it from the sibling 'transcribe' tool by noting it returns both a summary and transcript, eliminating the need for separate transcription calls.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool (e.g., 'good for meetings, long-form interviews, lectures, and podcast episodes where you want the gist without reading a full transcript') and when not to use it ('do not also call transcribe on the same file'), including clear alternatives and prerequisites like the three-call flow and payment process.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
transcribeAInspect
Transcribe audio or video to text, including per-word timestamps for precise editing. Three-call flow: (1) call with filename to receive {job_id, payment_challenge}; (2) pay via MPP, then call with job_id + payment_credential to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns two outputs: role transcript (SRT) and role transcript-words (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass format=srt|txt|vtt|json|words to get_job_status to receive content inline — txt and vtt are derived from SRT, json is v1 (segments only), words is v2 (segments + words). Flat price: audio $0.50, video $1.00 — see /.well-known/mpp.json for the authoritative table. Use for podcasts, interviews, meetings, lectures, and especially for creating clips, multicamera edits, or edit-video-from-transcript where word boundaries matter. Retrying any call with job_id alone returns current state (idempotent). Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | No | Job ID returned from a previous call. Include along with payment_credential to confirm payment and receive the presigned upload URL. Also include alone to recover the current challenge/state if the original response was lost. | |
| filename | No | Filename with extension (e.g. "podcast.mp3"). Required on the first call — used to infer media type (audio vs video) and label outputs. Supported extensions: mp3, wav, m4a, ogg, flac, mp4, mov, webm, mkv. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include with job_id to verify payment and receive the upload URL. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It thoroughly describes the multi-step workflow, payment requirements (MPP, pricing details), file upload process (presigned URL with expiry), output formats (SRT, VTT, TXT, JSON), and polling mechanism. This goes well beyond basic functionality to include operational constraints and expected behaviors.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized for a complex tool, front-loading the core purpose and workflow. Every sentence adds value, though it's somewhat dense with technical details. The structure flows logically from purpose to process to pricing to use cases, with no redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (multi-step workflow with payment integration), no annotations, and no output schema, the description provides comprehensive context. It covers the entire operational process, pricing, supported file types, output formats, and integration with sibling tools. This is complete enough for an agent to understand and correctly invoke the tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the baseline is 3. The description adds significant value by explaining the three-call flow and how parameters are used across different calls: 'filename' is required on first call, 'job_id' and 'payment_credential' are used together on second call, and 'job_id' alone can recover state. This contextual usage information isn't captured in the schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool's purpose: 'Start a transcription job for an audio or video file.' It clearly distinguishes this from sibling tools like 'complete_upload' (for finalizing uploads) and 'get_job_status' (for polling progress), and specifies it's the first step in a three-call flow.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: as the initial step in a multi-call workflow, with detailed instructions on subsequent steps (payment, upload, completion). It also lists use cases like 'podcasts, interviews, meetings, lectures, video content' and mentions alternatives implicitly by referencing sibling tools for later stages.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
trigger_youtube_publishAInspect
Start the YouTube upload after payment is confirmed. Call this after publish_to_youtube once payment_status is "paid". Returns immediately — the upload runs as a durable Workflow in the background. Poll get_youtube_publish_status to track progress.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | Job ID returned by publish_to_youtube | |
| session_token | Yes | Session token |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It discloses that the tool returns immediately and runs as a durable Workflow in the background, which is key behavioral context. Does not mention auth requirements, but flow implies it.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences: first states purpose and condition, second gives usage order, third describes behavior. Highly efficient and front-loaded with critical information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 2 parameters and no output schema, the description fully covers its role in the workflow, explains async behavior, and references sibling tools. Nothing essential is missing.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so description adds value by stating that 'job_id' is the one returned by 'publish_to_youtube', linking the parameter to a prior step. This external context aids the agent.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the action 'Start the YouTube upload' and specifies the precondition (payment confirmed). It distinguishes itself from siblings by referencing the preceding tool 'publish_to_youtube' and the subsequent 'get_youtube_publish_status'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: after 'publish_to_youtube' and 'payment_status is "paid"'. Also instructs to poll 'get_youtube_publish_status'. Could include a when-not, but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!