transcribe_speech
Convert speech audio to text with automatic language detection for WAV, MP3, WEBM, and OGG files. Provides transcription and identified language.
Instructions
Transcribe speech audio into text.
Supports multiple languages with automatic language detection. Returns the transcription text and detected language.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audio_base64 | Yes | Base64-encoded audio to transcribe (WAV, MP3, WEBM, OGG) | |
| language | No | Optional language hint, e.g. 'en', 'pt'. Auto-detected if omitted. |
Implementation Reference
- server.py:74-91 (handler)The main handler function for the transcribe_speech tool. It accepts base64-encoded audio and an optional language hint, then makes an HTTP POST request to the Brainiall API's /v1/stt/transcribe endpoint to perform speech-to-text transcription.
@mcp.tool() async def transcribe_speech( audio_base64: Annotated[str, "Base64-encoded audio to transcribe (WAV, MP3, WEBM, OGG)"], language: Annotated[Optional[str], "Optional language hint, e.g. 'en', 'pt'. Auto-detected if omitted."] = None, ) -> dict: """Transcribe speech audio into text. Supports multiple languages with automatic language detection. Returns the transcription text and detected language. """ payload: dict = {"audio_base64": audio_base64} if language: payload["language"] = language async with _client() as client: response = await client.post("/v1/stt/transcribe", json=payload) response.raise_for_status() return response.json() - server.py:75-78 (schema)Input schema definition using Python type annotations with Annotated. Defines two parameters: audio_base64 (required string with format hints) and language (optional string for language detection hint).
async def transcribe_speech( audio_base64: Annotated[str, "Base64-encoded audio to transcribe (WAV, MP3, WEBM, OGG)"], language: Annotated[Optional[str], "Optional language hint, e.g. 'en', 'pt'. Auto-detected if omitted."] = None, ) -> dict: - server.py:74-74 (registration)The @mcp.tool() decorator registers the transcribe_speech function as an MCP tool with the FastMCP framework.
@mcp.tool() - server.py:42-47 (helper)Helper function that creates an async HTTP client configured with the Brainiall API base URL, authorization headers, and timeout settings. Used by the transcribe_speech handler to make API requests.
def _client() -> httpx.AsyncClient: return httpx.AsyncClient( base_url=API_BASE, headers=_headers, timeout=60.0, )