vocea_transcribe
Transcribe base64-encoded audio to text. Supports multiple formats (mp3, wav, ogg, webm, flac) up to 10MB and various languages via BCP-47 codes.
Instructions
Transcribe audio from a base64-encoded string to text (STT).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audio_base64 | Yes | Base64-encoded audio file (mp3, wav, ogg, webm, flac, max 10MB) | |
| mime_type | No | MIME type, e.g. audio/mpeg | audio/mpeg |
| language | No | BCP-47 language code, e.g. en-US (default: es-ES) |
Implementation Reference
- src/index.ts:151-159 (handler)Handler for the vocea_transcribe tool. Takes audio_base64 (base64-encoded audio), optional mime_type and language. Decodes base64 to a Buffer, creates a Blob with the given MIME type, then calls vocea.stt.transcribe(blob, language) and returns the result as JSON.
case "vocea_transcribe": { const a = args as { audio_base64: string; mime_type?: string; language?: string }; const buffer = Buffer.from(a.audio_base64, "base64"); const blob = new Blob([buffer], { type: a.mime_type ?? "audio/mpeg" }); const result = await vocea.stt.transcribe(blob, a.language ?? "es-ES"); return { content: [{ type: "text", text: JSON.stringify(result) }], }; } - src/index.ts:73-83 (schema)Input schema for the vocea_transcribe tool. Defines audio_base64 (string, required), mime_type (string, optional, default audio/mpeg), and language (string, optional, BCP-47 format, default es-ES).
name: "vocea_transcribe", description: "Transcribe audio from a base64-encoded string to text (STT).", inputSchema: { type: "object", properties: { audio_base64: { type: "string", description: "Base64-encoded audio file (mp3, wav, ogg, webm, flac, max 10MB)" }, mime_type: { type: "string", description: "MIME type, e.g. audio/mpeg", default: "audio/mpeg" }, language: { type: "string", description: "BCP-47 language code, e.g. en-US (default: es-ES)" }, }, required: ["audio_base64"], }, - src/index.ts:30-84 (registration)Registration of the vocea_transcribe tool as part of the ListToolsRequestSchema handler, along with all other tools listed in the tools array.
{ name: "vocea_generate_audio", description: "Convert text to speech using a Vocea voice. Returns an audio URL.", inputSchema: { type: "object", properties: { voice_id: { type: "string", description: "Voice UUID to use for synthesis" }, text: { type: "string", description: "Text to convert to speech (max 10000 chars)" }, language_code: { type: "string", description: "Language code, e.g. 'en', 'es', 'fr'" }, emotion: { type: "string", enum: ["neutral", "happy", "sad", "angry", "fearful", "surprised", "disgusted", "whisper"], description: "Emotional tone (default: neutral)", }, speaking_rate: { type: "number", description: "Speaking rate multiplier 0.5–1.5 (default 1.0)" }, }, required: ["voice_id", "text", "language_code"], }, }, { name: "vocea_list_voices", description: "List the authenticated user's cloned voices.", inputSchema: { type: "object", properties: { page: { type: "number", description: "Page number (default 1)" }, limit: { type: "number", description: "Results per page (default 20)" }, }, }, }, { name: "vocea_list_public_voices", description: "List public community voices available for use.", inputSchema: { type: "object", properties: { page: { type: "number" }, limit: { type: "number" }, ageRange: { type: "string", enum: ["young", "adult", "senior"] }, }, }, }, { name: "vocea_transcribe", description: "Transcribe audio from a base64-encoded string to text (STT).", inputSchema: { type: "object", properties: { audio_base64: { type: "string", description: "Base64-encoded audio file (mp3, wav, ogg, webm, flac, max 10MB)" }, mime_type: { type: "string", description: "MIME type, e.g. audio/mpeg", default: "audio/mpeg" }, language: { type: "string", description: "BCP-47 language code, e.g. en-US (default: es-ES)" }, }, required: ["audio_base64"], }, },