Skip to main content
Glama

stt

Transcribe audio files to text with word-level timestamps using a URL input. Supports multiple languages.

Instructions

Transcribe audio to text with timestamps. Cost: 2 credits.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
audio_urlNoURL to audio file
languageNoLanguage codeen

Implementation Reference

  • Schema definition for the 'stt' tool: transcribes audio to text. Inputs: audio_url (optional string URL) and language (optional string, defaults to 'en').
    {
      name: "stt",
      description: "Transcribe audio to text with timestamps. Cost: 2 credits.",
      inputSchema: {
        audio_url: z.string().optional().describe("URL to audio file"),
        language: z.string().optional().default("en").describe("Language code"),
      },
    },
  • Generic handler for all tools including 'stt'. Calls the Suprsonic REST API via callSuprsonic() with the capability name and args.
      async (args: any): Promise<CallToolResult> => {
        return callSuprsonic(cap.name, args as Record<string, unknown>);
      },
    );
  • src/index.ts:247-259 (registration)
    Registration loop: iterates over all CAPABILITIES (including 'stt') and registers each as an MCP tool using server.registerTool().
    for (const cap of CAPABILITIES) {
      // Cast inputSchema to avoid TS2589 (excessively deep type instantiation from Zod chains)
      server.registerTool(
        cap.name,
        {
          description: cap.description,
          inputSchema: cap.inputSchema as any,
        },
        async (args: any): Promise<CallToolResult> => {
          return callSuprsonic(cap.name, args as Record<string, unknown>);
        },
      );
    }
  • Generic helper function callSuprsonic() that all tools delegate to. It POSTs to the Suprsonic REST API with the capability name (e.g. 'stt') and params, then returns the result.
    async function callSuprsonic(capability: string, params: Record<string, unknown>): Promise<CallToolResult> {
      if (!API_KEY) {
        return {
          content: [{ type: "text", text: "Error: SUPRSONIC_API_KEY environment variable is not set. Get your key at https://suprsonic.ai/app/apis" }],
          isError: true,
        };
      }
    
      try {
        const resp = await fetch(`${BASE_URL}/v1/agent`, {
          method: "POST",
          headers: {
            "Authorization": `Bearer ${API_KEY}`,
            "Content-Type": "application/json",
          },
          body: JSON.stringify({ capability, params }),
        });
    
        const result = await resp.json() as any;
    
        // Handle non-envelope responses (401, 429, etc. return {"detail": ...})
        if (result.detail && result.success === undefined) {
          const msg = typeof result.detail === "object" ? (result.detail.title || result.detail.detail || JSON.stringify(result.detail)) : String(result.detail);
          return {
            content: [{ type: "text", text: `Error (HTTP ${resp.status}): ${msg}` }],
            isError: true,
          };
        }
    
        if (!result.success) {
          const errMsg = result.error?.detail || result.error?.title || "Request failed";
          return {
            content: [{ type: "text", text: `Error: ${errMsg}` }],
            isError: true,
          };
        }
    
        const text = JSON.stringify(result.data, null, 2);
        const meta = result.metadata
          ? `\n\n[Provider: ${(result.metadata as any).provider_used || "unknown"}, ${(result.metadata as any).response_time_ms || 0}ms, ${result.credits_used || 0} credits]`
          : "";
    
        return {
          content: [{ type: "text", text: text + meta }],
        };
      } catch (err) {
        return {
          content: [{ type: "text", text: `Network error: ${err instanceof Error ? err.message : String(err)}` }],
          isError: true,
        };
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

In the absence of annotations, the description only discloses the cost (2 credits) and the inclusion of timestamps. It lacks information on side effects, limitations (e.g., file size, formats), or required permissions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that efficiently conveys the core function and cost, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is adequate for a simple tool but lacks key details like supported audio formats, maximum duration, or output structure, especially since no output schema is provided.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds no parameter-specific details beyond the schema, not explaining, for example, that audio_url must be publicly accessible or the expected language code format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool transcribes audio to text with timestamps, using a specific verb and resource. However, it does not differentiate from sibling tools like 'transcribe' or 'subtitle'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, nor are there any exclusions or prerequisites mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/O-mega-Enterprise/suprsonic-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server