Skip to main content
Glama

transcribe

Transcribe audio or video files into text, identifying speakers and adding timestamps.

Instructions

Transcribe audio or video with speaker labels and timestamps. Cost: 3 credits.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
audio_urlYesURL to audio or video file
speaker_labelsNoEnable speaker diarization

Implementation Reference

  • Schema definition for the 'transcribe' tool: accepts audio_url (string) and speaker_labels (boolean, default true).
      name: "transcribe",
      description: "Transcribe audio or video with speaker labels and timestamps. Cost: 3 credits.",
      inputSchema: {
        audio_url: z.string().describe("URL to audio or video file"),
        speaker_labels: z.boolean().optional().default(true).describe("Enable speaker diarization"),
      },
    },
  • src/index.ts:247-259 (registration)
    Tools are registered dynamically in a loop over CAPABILITIES. The 'transcribe' tool is registered via server.registerTool at line 249 when cap.name === 'transcribe'.
    for (const cap of CAPABILITIES) {
      // Cast inputSchema to avoid TS2589 (excessively deep type instantiation from Zod chains)
      server.registerTool(
        cap.name,
        {
          description: cap.description,
          inputSchema: cap.inputSchema as any,
        },
        async (args: any): Promise<CallToolResult> => {
          return callSuprsonic(cap.name, args as Record<string, unknown>);
        },
      );
    }
  • Generic handler function callSuprsonic that executes all tool logic. It sends a POST request to the Suprsonic API with the capability name ('transcribe') and params, then returns the result.
    async function callSuprsonic(capability: string, params: Record<string, unknown>): Promise<CallToolResult> {
      if (!API_KEY) {
        return {
          content: [{ type: "text", text: "Error: SUPRSONIC_API_KEY environment variable is not set. Get your key at https://suprsonic.ai/app/apis" }],
          isError: true,
        };
      }
    
      try {
        const resp = await fetch(`${BASE_URL}/v1/agent`, {
          method: "POST",
          headers: {
            "Authorization": `Bearer ${API_KEY}`,
            "Content-Type": "application/json",
          },
          body: JSON.stringify({ capability, params }),
        });
    
        const result = await resp.json() as any;
    
        // Handle non-envelope responses (401, 429, etc. return {"detail": ...})
        if (result.detail && result.success === undefined) {
          const msg = typeof result.detail === "object" ? (result.detail.title || result.detail.detail || JSON.stringify(result.detail)) : String(result.detail);
          return {
            content: [{ type: "text", text: `Error (HTTP ${resp.status}): ${msg}` }],
            isError: true,
          };
        }
    
        if (!result.success) {
          const errMsg = result.error?.detail || result.error?.title || "Request failed";
          return {
            content: [{ type: "text", text: `Error: ${errMsg}` }],
            isError: true,
          };
        }
    
        const text = JSON.stringify(result.data, null, 2);
        const meta = result.metadata
          ? `\n\n[Provider: ${(result.metadata as any).provider_used || "unknown"}, ${(result.metadata as any).response_time_ms || 0}ms, ${result.credits_used || 0} credits]`
          : "";
    
        return {
          content: [{ type: "text", text: text + meta }],
        };
      } catch (err) {
        return {
          content: [{ type: "text", text: `Network error: ${err instanceof Error ? err.message : String(err)}` }],
          isError: true,
        };
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions cost and output features (speaker labels, timestamps) but omits important behavioral details like processing latency, supported formats, file size limits, or whether the operation is destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with zero waste. Every word adds value, stating the core function, key features, and cost.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description does not specify output format, supported file types, maximum duration, or any constraints. Given the complexity of audio/video transcription, it leaves significant gaps for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for both parameters. The description adds no extra meaning beyond the schema; it only mentions speaker labels and timestamps (which are not parameters). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool transcribes audio or video and includes speaker labels and timestamps. It distinguishes from potential siblings like 'stt' or 'subtitle' by specifying speaker diarization, but does not explicitly contrast with them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus siblings such as 'stt' or 'subtitle'. The description mentions the credit cost but lacks context about prerequisites, alternatives, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/O-mega-Enterprise/suprsonic-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server