Weftly
Server Details
Transcribe and summarize audio and video. Pay per job via Stripe or crypto.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.7/5 across 8 of 8 tools scored.
Each tool serves a distinct purpose: upload confirmation, clip extraction (horizontal and vertical), clip finding, transcription, summarization, status polling, and payment testing. No overlap in functionality.
All tool names follow a consistent verb_noun snake_case pattern (e.g., extract_clip, get_job_status). Minor variation with 'mpp_smoke_test' but still clear and predictable.
8 tools is well-scoped for a media processing server covering upload, transcribe, summarize, find/extract clips, status checks, and payment testing. No tools feel redundant or missing.
The tool set covers the full workflow: upload (with three-call flow), transcription, summarization, clip candidate generation, clip extraction (horizontal and vertical), and status polling. The inclusion of a smoke test for payments adds robustness.
Available Tools
8 toolscomplete_uploadAInspect
Confirm that the file has been uploaded (via HTTP PUT to the upload_url from transcribe or summarize) and start processing. Verifies that the file is present in storage and that the job has been paid. Returns status "processing". Poll get_job_status to track progress and retrieve download URLs when done.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | Yes | The job_id returned from a previous transcribe or summarize call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it triggers processing after upload confirmation, returns errors for missing files or incomplete payment, and requires polling 'check_job_status' for progress tracking. This covers operational flow, error conditions, and async behavior, though it doesn't detail auth needs or rate limits explicitly.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is highly concise and well-structured: three sentences that efficiently cover purpose, prerequisites, error conditions, and next steps. Each sentence adds critical information without redundancy, making it easy to parse and front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (post-upload processing with async behavior), no annotations, and no output schema, the description does a good job of completeness. It explains the workflow, prerequisites, error cases, and follow-up actions. However, it lacks details on return values or output format, which would be helpful since there's no output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters ('session_token' and 'job_id') documented in the schema. The description doesn't add any parameter-specific semantics beyond what the schema provides (e.g., it doesn't explain format or constraints for 'session_token' or 'job_id'). Baseline 3 is appropriate since the schema handles parameter documentation adequately.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Confirm the upload is done and start processing.' It specifies the verb ('confirm' and 'start processing') and resource ('upload'), but doesn't explicitly distinguish it from sibling tools like 'upload_file' or 'get_upload_url' beyond mentioning they are prerequisites. The purpose is clear but sibling differentiation is only implied through usage context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidelines: 'Call this after uploading via either upload_file or the presigned URL from get_upload_url.' It names specific alternatives ('upload_file' and 'get_upload_url') as prerequisites and directs to 'poll check_job_status to track progress' for post-invocation. This clearly indicates when to use this tool versus others in the workflow.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
extract_clipAInspect
Cut and assemble a clip from any prior video job (find_clips, summarize, or video transcribe). Operates on a parent job — possessing the parent source_job_id is the capability, no upload step. Pass one segment for a simple cut, or multiple non-contiguous segments to compose a single mp4 highlight reel — same flat $0.50 either way. Two-call flow: (1) call with source_job_id + segments (ordered array of {start, end, label?} in source seconds, total duration capped at 30 minutes) to receive {job_id, payment_challenge}; (2) pay via MPP and call with job_id + payment_credential to start processing. No upload step. Poll get_job_status(job_id) for completion; outputs are role clip-video (the assembled .mp4, frame-accurate boundaries with 15ms audio fades at segment joins) and — when include_transcript: true (default) — roles clip-srt + clip-words (transcripts stitched and time-shifted to match the assembled video). Set include_transcript: false to skip transcript outputs. Payment: MPP — accepts Tempo USDC and Stripe SPT. The challenge's WWW-Authenticate header and /.well-known/mpp.json are authoritative for which methods are offered. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check expires_at from get_job_status on the parent). Multiple extract_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Optional title for the assembled clip. Surfaces in get_job_status and download filenames; doesn't affect the cut itself. | |
| job_id | No | Job ID returned from a previous extract_clip call. Include along with payment_credential to confirm payment and trigger processing. Also include alone to recover the current state. | |
| segments | No | Ordered array of source-relative segments to cut and concatenate into the output. Single segment for a simple cut; multiple segments compose a single mp4 from non-contiguous moments — same flat $0.50 either way. Total summed duration capped at 30 minutes per call. Required on the first call. | |
| source_job_id | No | Job ID of any prior video job (find_clips, summarize, or video transcribe). Possessing this id is the capability — extract_clip is not session-bound, so a user can come back from a different session within the parent's TTL and still extract. Required on the first call. | |
| include_transcript | No | Default true. When true, the pipeline writes clip-srt + clip-words outputs stitched to match the assembled video. Set false to skip and just receive the .mp4. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ..."). extract_clip accepts Tempo USDC and Stripe SPT — see the challenge's WWW-Authenticate header or /.well-known/mpp.json for the supported methods. Include with job_id after paying the challenge to start processing. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses two-call workflow, payment requirement, source TTL (72h/24h), duration cap (30 min), flat fee ($0.50), output roles (clip-video, clip-srt, clip-words), transcript option, and auto-refund on failure. No annotations provided, so description fully addresses behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is detailed and well-structured, front-loading the main purpose and then logically expanding on workflow, payment, and constraints. Slightly verbose but all sentences add necessary context for a complex tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema exists, but description fully explains return values (job_id, payment_challenge, output roles) and error handling (auto-refund). Covers source TTL, segment limits, and transcript behavior, making it complete for agent usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but description adds essential workflow context (e.g., two-call flow, payment credential usage, segment ordering, label purpose) beyond the schema's basic field descriptions. Each parameter's role in the flow is explained.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the tool cuts and assembles clips from prior video jobs ('Cut and assemble a clip from any prior video job'). It specifies the resource (parent job) and action, and distinguishes from sibling 'extract_vertical_clip' by implying orientation difference.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly details when to use (for extraction from prior jobs), the two-call flow, payment process, and polling for completion. Provides guidance on alternatives (multiple calls independent, TTL limits) and contrasts with upload steps absent.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
extract_vertical_clipAInspect
Cut a 9:16 vertical clip from any prior video job (find_clips, summarize, or video transcribe), suitable for direct upload to TikTok, Instagram Reels, or YouTube Shorts. Default output is 1080×1920 H.264 / AAC .mp4 with center-cropped framing; audio loudness-normalized to -14 LUFS / -1.5 dBTP for short-form social. Single-segment only; clip duration must be between 1 and 90 seconds (Instagram Reels max). Operates on a parent job — possessing the parent source_job_id is the capability, no upload step. Two-call flow: (1) call with source_job_id + start + end (in source seconds) to receive {job_id, payment_challenge}; (2) pay via MPP and call with job_id + payment_credential to start processing. Poll get_job_status(job_id) for completion; output is role clip-vertical-video (the .mp4). Flat price: $0.50 per clip. Payment: MPP — accepts Tempo USDC and Stripe SPT. Optional profile parameter selects the encoding profile (default tiktok-primary). Allowed values: tiktok-primary (1080×1920, fast preset, CRF 22), tiktok-primary-720p (720×1280, CBR 3 Mbps — half-resolution mobile-optimized, ~40% faster wall time), instagram-reels (1080×1920, slow preset, CBR 4 Mbps), instagram-stories (same encode shape as instagram-reels). All four profiles loudness-normalize identically. Source must be a horizontal video (wider than 9:16) — already-vertical or square sources are rejected. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check expires_at from get_job_status on the parent). Pair with find_clips ($2.00/video) to pick a moment first, then call this to get a download-ready vertical mp4 in under 5 minutes. Multiple extract_vertical_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| end | No | Source-relative end time in seconds (must be > start, and end - start ∈ [1, 90]). Required on the first call. | |
| start | No | Source-relative start time in seconds. Required on the first call. | |
| title | No | Optional title for the assembled clip. Surfaces in get_job_status and download filenames; doesn't affect the cut itself. | |
| job_id | No | Job ID returned from a previous extract_vertical_clip call. Include along with payment_credential to confirm payment and trigger processing. Also include alone to recover the current state. | |
| profile | No | Optional encoding profile. Default: tiktok-primary (1080×1920 H.264 fast preset, CRF 22, 6 Mbps cap). tiktok-primary-720p: 720×1280, CBR 3 Mbps — half-resolution mobile-optimized, ~40% faster wall time. instagram-reels: 1080×1920 H.264 slow preset, CBR 4 Mbps. instagram-stories: same encode shape as instagram-reels. All four apply loudness normalization to -14 LUFS / -1.5 dBTP. | |
| source_job_id | No | Job ID of any prior video job (find_clips, summarize, or video transcribe). Possessing this id is the capability — extract_vertical_clip is not session-bound, so a user can come back from a different session within the parent's TTL and still extract. Required on the first call. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ..."). extract_vertical_clip accepts Tempo USDC and Stripe SPT — see the challenge's WWW-Authenticate header or /.well-known/mpp.json for the supported methods. Include with job_id after paying the challenge to start processing. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description fully covers behavioral traits: default output (1080×1920 H.264/AAC), loudness normalization, source requirements (horizontal, TTL), the two-call flow, payment details, and auto-refund on failure. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is thorough and well-structured, starting with purpose, then specifics, flow, pricing, and optional parameters. It is slightly dense but all sentences are justified. Could be trimmed slightly, but overall efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite no output schema, the description provides complete contextual coverage: purpose, constraints, two-call flow, pricing, payment methods, source prerequisites, and error handling. An agent can fully understand and invoke the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, but the description adds significant value by explaining the two-call flow for parameters (source_job_id+start+end vs. job_id+payment_credential), detailing each profile option, clarifying title usage, and noting that source_job_id is session-independent.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool cuts a 9:16 vertical clip from prior video jobs (find_clips, summarize, or video transcribe) for short-form social media platforms. It distinguishes itself from siblings like extract_clip and find_clips by specifying the output format and use case.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly explains when to use the tool (for social media vertical clips), when not to use it (source must be horizontal, single-segment only, duration 1-90 seconds), and provides alternatives (pair with find_clips). It also outlines the two-call flow and payment requirements.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
find_clipsAInspect
START HERE for any clip workflow on a video — find_clips is the canonical entry point and includes a full transcription as a free byproduct. Do not call transcribe first: doing so doubles the upload, doubles the spend, and produces the same transcript. Identify ranked candidate clips in a video — what to cut for highlights, social, or testimonials. Three-call flow: (1) call with filename (and optional query) to receive {job_id, payment_challenge}; (2) pay via MPP, then call with job_id + payment_credential to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns presigned download URLs for three files: role clip-candidates (JSON matching /.well-known/weftly-clips-v1.schema.json — includes source_job_id and source_expires_at), role transcript (SRT, free byproduct), role transcript-words (JSON matching /.well-known/weftly-transcript-v2.schema.json, free byproduct). Each candidate carries transcript_text — the full text of what's in the clip — so callers can preview content before paying for extract_clip. Optional query parameter switches to query mode (e.g., "they discuss pricing", "the part about hiring") with the same output shape; the mode field in clip-candidates.json indicates which mode produced the result. Flat price: $2.00 video — see /.well-known/mpp.json. Source-reuse contract: the source video stays in storage for 72h after find_clips completes. Hand the find_clips job_id (also returned as source_job_id in the candidates JSON) to extract_clip or extract_vertical_clip as their source_job_id — within those 72h they cut directly from the stored source: no re-upload, no re-transcribe, just $0.50 per cut. Pass the same source_job_id to as many extract calls as you need. Use for interviews, podcasts, sales calls, all-hands recordings. Retrying with job_id alone returns current state. Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| query | No | Optional. Switches the analyzer from "best clips" discovery mode to query mode — finds segments matching this content (e.g., "they discuss pricing", "the part about hiring"). Same output shape either way; the `mode` field in clip-candidates.json tells consumers how to interpret per-candidate scoring. | |
| job_id | No | Job ID returned from a previous call. Include along with payment_credential to confirm payment and receive the presigned upload URL. Also include alone to recover the current challenge/state if the original response was lost. | |
| filename | No | Filename with extension (e.g. "podcast.mp3"). Required on the first call — used to infer media type (audio vs video) and label outputs. Supported extensions: mp3, wav, m4a, ogg, flac, mp4, mov, webm, mkv. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include with job_id to verify payment and receive the upload URL. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description fully discloses the payment flow, upload process, file expiration (72h), output format references to schemas, and auto-refund on failure. It also explains the source-reuse contract for extract calls.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is very long and thorough, but not concise. It includes extensive details that could be structured more compactly. While the content is valuable, it sacrifices brevity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, the description thoroughly explains the output roles and schemas (clip-candidates, transcript, transcript-words). It covers the entire workflow including payment, upload, polling, and source-reuse, making it complete for an agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, baseline 3. The description adds significant context beyond schema: explains the three-call flow, how job_id is used, what filename does, and gives examples for query. It adds value beyond the schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states that find_clips is the canonical entry point for clip workflows, includes transcription as a free byproduct, and distinguishes it from siblings like transcribe and extract_clip. It provides a specific verb and resource with clear scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidelines: start here for clip workflows, do not call transcribe first, and details the three-call flow. It also warns against doubling uploads and spend, and gives retry behavior.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_job_statusAInspect
Check the status of a transcribe or summarize job. Returns the current state and, when completed, presigned download URLs for each output file (roles include transcript, transcript-words, and for summarize jobs also summary). Optionally pass format (srt, txt, vtt, json, words) to get the transcript content inline — useful when you need the text directly without fetching a URL. txt and vtt are derived from the stored SRT; json is v1 (segments only); words is v2 (segments + per-word timestamps matching /.well-known/weftly-transcript-v2.schema.json). Poll this periodically after calling complete_upload — wait at least 60 seconds between checks. For files under 10 minutes, jobs usually complete within 1-2 minutes. For long files (1hr+), expect 10-30 minutes. Download URLs are presigned and time-limited (1 hour); fetch them when needed rather than caching long-term.
Also use this to recover from lost state: if the original challenge was lost, call get_job_status(job_id) to retrieve a fresh challenge (status "awaiting_payment") or the upload URL (status "awaiting_upload").
| Name | Required | Description | Default |
|---|---|---|---|
| format | No | When the job is completed, return the transcript inline in this format instead of only a download URL. Options: "srt" (SubRip with timestamps), "txt" (plain text — no timestamps), "vtt" (WebVTT), "json" (v1, segments only), "words" (v2, segments + per-word timestamps matching /.well-known/weftly-transcript-v2.schema.json). Omit for download URLs only. | |
| job_id | Yes | The job_id returned from a previous transcribe or summarize call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden and delivers substantial behavioral context. It discloses: polling behavior with timing constraints (60 seconds between checks), typical completion times (1-2 minutes for short files, 10-30 minutes for long files), presigned URL characteristics (time-limited to 1 hour), and recovery functionality for lost state. It doesn't mention error handling or rate limits, keeping it from a perfect score.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized and front-loaded with core functionality. Every sentence adds value: first defines purpose, second explains format parameter utility, third gives polling guidance, fourth provides timing estimates, fifth describes URL characteristics, sixth introduces recovery use case. Minor redundancy in explaining format options slightly reduces efficiency.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (status checking with polling, recovery functionality, inline format options) and no annotations or output schema, the description provides comprehensive context. It covers purpose, usage patterns, behavioral expectations, parameter semantics, timing guidance, and edge cases (recovery from lost state), making it complete enough for effective agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the baseline is 3. The description adds meaningful context beyond the schema: it explains that the 'format' parameter is 'useful when you need the text directly without fetching a URL' and clarifies that omitting it returns 'download URLs only'. This provides practical usage insight that enhances the schema's technical specifications.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Check the status of a transcribe or summarize job' with specific verbs ('check', 'retrieve') and resources ('job', 'download URLs', 'fresh challenge', 'upload URL'). It distinguishes from siblings like 'complete_upload', 'summarize', and 'transcribe' by focusing on status monitoring rather than job initiation or completion.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidance: 'Poll this periodically after calling complete_upload — wait at least 60 seconds between checks' and 'Also use this to recover from lost state'. It distinguishes when to use this tool versus alternatives by specifying it's for checking job status after job creation tools, and it includes timing recommendations for different file sizes.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
mpp_smoke_testAInspect
Smoke-test the MPP payment plumbing end-to-end via this MCP server, for $0.01 USDC. Two-call flow: (1) call with no arguments to receive an MPP payment_challenge; (2) pay via MPP and call again with payment_credential set to the resulting Authorization header value (e.g. "Payment eyJ...") to receive {paid: true, timestamp, receipt_ref, payment_method}. Uses the exact same createPayToAddress + createMppHandler verification path as paid product tools (transcribe, summarize), so a green run here means real paid calls will work too. Stateless — no job is created, no database row written. Use this whenever you want to confirm a wallet, the MCP transport, the worker, and the production payment middleware are all healthy without paying a transcribe price. Cost: $0.01 USDC per attempt.
| Name | Required | Description | Default |
|---|---|---|---|
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include to verify payment and receive {paid: true}. Omit on the first call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It discloses statelessness, no job created, no database row written, and that it uses the exact same verification path as paid tools. Also mentions cost and two-call requirement.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single dense paragraph but well-organized: purpose, flow, example, stateless note, usage guidance, cost. Every sentence adds value; slight lack of bulleted structure prevents a 5.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the two-call payment flow complexity, description thoroughly explains the entire process, return values, and relationship to other tools. No output schema but description covers return of second call.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with one parameter. Description adds meaning beyond schema by explaining the parameter is the Authorization header value from paying the challenge, and should be omitted on first call. Very helpful.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description specifies the tool is for smoke-testing MPP payment plumbing via a two-call flow costing $0.01 USDC. It clearly distinguishes this tool from paid siblings like transcribe and summarize by noting it's a cheaper health check.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'whenever you want to confirm a wallet, the MCP transport, the worker, and the production payment middleware are all healthy without paying a transcribe price.' Provides the two-call flow steps. Does not mention when not to use or alternatives, but context is clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
summarizeAInspect
Summarize an audio or video file — returns both a text summary AND the full transcript (with per-word timestamps). Do not also call transcribe on the same file. Three-call flow: (1) call with filename to receive {job_id, payment_challenge}; (2) pay via MPP, then call with job_id + payment_credential to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns presigned download URLs for three files: role summary (plain text), role transcript (SRT), and role transcript-words (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass format=srt|txt|vtt|json|words to get_job_status to receive transcript content inline — txt and vtt are derived from SRT, json is v1 (segments only), words is v2 (segments + words). Flat price: audio $0.75, video $1.25 — see /.well-known/mpp.json for the authoritative table. Use for meetings, long-form interviews, lectures, and podcast episodes; the words output additionally supports creating clips, multicamera edits, or edit-video-from-transcript. Retrying any call with job_id alone returns current state (idempotent). Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | No | Job ID returned from a previous call. Include along with payment_credential to confirm payment and receive the presigned upload URL. Also include alone to recover the current challenge/state if the original response was lost. | |
| filename | No | Filename with extension (e.g. "podcast.mp3"). Required on the first call — used to infer media type (audio vs video) and label outputs. Supported extensions: mp3, wav, m4a, ogg, flac, mp4, mov, webm, mkv. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include with job_id to verify payment and receive the upload URL. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description fully discloses behavioral traits: it explains the multi-step workflow (create job, pay, upload, poll), pricing details ($0.75 for audio, $1.25 for video), output formats (SRT, VTT, TXT, JSON), and constraints like segment-level transcripts and flat pricing per job, covering all critical operational aspects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and front-loaded with key information (purpose and workflow), but includes some redundant details (e.g., repeating the three-call flow and pricing in multiple sentences) that slightly reduce efficiency, though most content is valuable.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (multi-step workflow, payment, sibling interactions) and lack of annotations or output schema, the description is highly complete: it covers the process, outputs, pricing, use cases, and integration with other tools, providing all necessary context for effective agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal extra context (e.g., linking 'filename' to media type inference and 'payment_credential' to the MPP flow), but doesn't significantly enhance parameter understanding beyond what the schema provides, meeting the baseline expectation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('start a summarization job') and resources ('audio or video file'), and explicitly distinguishes it from the sibling 'transcribe' tool by noting it returns both a summary and transcript, eliminating the need for separate transcription calls.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool (e.g., 'good for meetings, long-form interviews, lectures, and podcast episodes where you want the gist without reading a full transcript') and when not to use it ('do not also call transcribe on the same file'), including clear alternatives and prerequisites like the three-call flow and payment process.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
transcribeAInspect
Transcribe audio or video to text, including per-word timestamps for precise editing. Three-call flow: (1) call with filename to receive {job_id, payment_challenge}; (2) pay via MPP, then call with job_id + payment_credential to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns presigned download URLs for two files: role transcript (SRT) and role transcript-words (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass format=srt|txt|vtt|json|words to get_job_status to receive content inline — txt and vtt are derived from SRT, json is v1 (segments only), words is v2 (segments + words). Flat price: audio $0.50, video $1.00 — see /.well-known/mpp.json for the authoritative table. Use for podcasts, interviews, meetings, lectures, and especially for creating clips, multicamera edits, or edit-video-from-transcript where word boundaries matter. Retrying any call with job_id alone returns current state (idempotent). Failed jobs auto-refund.
| Name | Required | Description | Default |
|---|---|---|---|
| job_id | No | Job ID returned from a previous call. Include along with payment_credential to confirm payment and receive the presigned upload URL. Also include alone to recover the current challenge/state if the original response was lost. | |
| filename | No | Filename with extension (e.g. "podcast.mp3"). Required on the first call — used to infer media type (audio vs video) and label outputs. Supported extensions: mp3, wav, m4a, ogg, flac, mp4, mov, webm, mkv. | |
| payment_credential | No | MPP payment credential (full Authorization header value, e.g. "Payment eyJ...") obtained by paying the challenge returned from the first call. Include with job_id to verify payment and receive the upload URL. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It thoroughly describes the multi-step workflow, payment requirements (MPP, pricing details), file upload process (presigned URL with expiry), output formats (SRT, VTT, TXT, JSON), and polling mechanism. This goes well beyond basic functionality to include operational constraints and expected behaviors.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized for a complex tool, front-loading the core purpose and workflow. Every sentence adds value, though it's somewhat dense with technical details. The structure flows logically from purpose to process to pricing to use cases, with no redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (multi-step workflow with payment integration), no annotations, and no output schema, the description provides comprehensive context. It covers the entire operational process, pricing, supported file types, output formats, and integration with sibling tools. This is complete enough for an agent to understand and correctly invoke the tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the baseline is 3. The description adds significant value by explaining the three-call flow and how parameters are used across different calls: 'filename' is required on first call, 'job_id' and 'payment_credential' are used together on second call, and 'job_id' alone can recover state. This contextual usage information isn't captured in the schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool's purpose: 'Start a transcription job for an audio or video file.' It clearly distinguishes this from sibling tools like 'complete_upload' (for finalizing uploads) and 'get_job_status' (for polling progress), and specifies it's the first step in a three-call flow.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: as the initial step in a multi-call workflow, with detailed instructions on subsequent steps (payment, upload, completion). It also lists use cases like 'podcasts, interviews, meetings, lectures, video content' and mentions alternatives implicitly by referencing sibling tools for later stages.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!