Skip to main content
Glama
speakai

Speak AI MCP Server

Official
by speakai

Upload and Analyze Media

upload_and_analyze

Upload media from direct file URLs or social/video links (YouTube, Instagram, TikTok, etc.) for transcription. Returns a media ID for async processing; poll status and retrieve AI-powered insights.

Instructions

Upload and transcribe media from a URL — a direct/public file URL, OR a shareable social/video link (YouTube, Instagram, TikTok, X, Facebook, Reddit, SoundCloud, and similar), which Speak resolves to the underlying media automatically. Returns media_id immediately; after this returns, poll get_media_status until state is 'processed' (typically 1-3 min for under 60min audio), then call get_media_insights for AI summaries. This async pattern is required for remote MCP transports — long blocking calls die at proxy idle timeouts. (Vimeo links are not yet supported.)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesDirect/public media file URL, or a shareable social/video page link (e.g. an Instagram reel, TikTok, YouTube, or X post URL) — page links are resolved to the underlying media server-side. Pass the URL the user gave you as-is.
nameNoDisplay name for the media (defaults to filename from URL)
tagsNoComma-separated tags
folderIdNoFolder ID to place the media in
mediaTypeNoMedia type (default: audio)
sourceLanguageNoBCP-47 language code (e.g., 'en-US', 'he-IL')

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
dataNoResponse payload from the Speak AI API
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds value beyond annotations: describes the async behavior, polling requirement, typical processing time, and Vimeo limitation. Annotations indicate a write operation (readOnlyHint=false) which aligns with 'upload and transcribe' description. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph with clear front-loading of core action, followed by async guidance and a limitation note. Every sentence adds value, though it could be structured with bullet points for easier scanning. Not overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the essential workflow: what the tool does, how to use it (with polling), what to expect (media_id, processing time), and a known limitation. Does not cover error handling or timeout scenarios, but for a first-use description it is reasonably complete. Output schema exists to document return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 6 parameters have schema descriptions (100% coverage), so baseline is 3. The description repeats the URL parameter's supported types but adds little to the schema descriptions for other parameters. Falls within the expected range.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb+resource: 'Upload and transcribe media from a URL'. Specifies supported URL types (direct, social/video links) and explicit exclusion (Vimeo). Distinguishes from siblings by highlighting automatic resolution of social links, but does not directly contrast with upload_media or upload_local_file.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit async pattern: immediate media_id return, poll get_media_status until 'processed', then call get_media_insights. Explains rationale for async pattern (idle timeouts). Does not explicitly state when not to use (e.g., for local files), but the URL focus is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/speakai/speakai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server