assess_pronunciation
Analyze pronunciation accuracy by comparing spoken audio to reference text. Provides detailed scores and phoneme-level feedback to identify areas for improvement.
Instructions
Assess how accurately a speaker pronounced the given text.
Returns an overall pronunciation score (0-100), per-word scores, and phoneme-level feedback including accuracy, fluency, and completeness.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | The reference text the user should have read aloud | |
| audio_base64 | Yes | Base64-encoded audio of the user reading the text (WAV or MP3) | |
| language | No | Language code, e.g. 'en-US', 'pt-BR', 'es-ES' | en-US |
Implementation Reference
- server.py:51-71 (handler)The main handler function for the 'assess_pronunciation' tool. It takes text, base64-encoded audio, and language code as inputs, makes an async POST request to the Brainiall API endpoint '/v1/pronunciation/assess', and returns a JSON response with pronunciation scores including overall score (0-100), per-word scores, and phoneme-level feedback.
async def assess_pronunciation( text: Annotated[str, "The reference text the user should have read aloud"], audio_base64: Annotated[str, "Base64-encoded audio of the user reading the text (WAV or MP3)"], language: Annotated[str, "Language code, e.g. 'en-US', 'pt-BR', 'es-ES'"] = "en-US", ) -> dict: """Assess how accurately a speaker pronounced the given text. Returns an overall pronunciation score (0-100), per-word scores, and phoneme-level feedback including accuracy, fluency, and completeness. """ async with _client() as client: response = await client.post( "/v1/pronunciation/assess", json={ "text": text, "audio_base64": audio_base64, "language": language, }, ) response.raise_for_status() return response.json() - server.py:52-55 (schema)Input schema definition using Annotated type hints. Defines three parameters: 'text' (required reference text), 'audio_base64' (required base64-encoded audio in WAV or MP3 format), and 'language' (optional with default 'en-US').
text: Annotated[str, "The reference text the user should have read aloud"], audio_base64: Annotated[str, "Base64-encoded audio of the user reading the text (WAV or MP3)"], language: Annotated[str, "Language code, e.g. 'en-US', 'pt-BR', 'es-ES'"] = "en-US", ) -> dict: - server.py:50-50 (registration)Tool registration using the @mcp.tool() decorator from FastMCP framework, which registers the function as an MCP tool named 'assess_pronunciation'.
@mcp.tool() - server.py:42-47 (helper)Helper function _client() that creates and returns an httpx.AsyncClient configured with the API base URL, authorization headers, and 60-second timeout for making HTTP requests to the Brainiall API.
def _client() -> httpx.AsyncClient: return httpx.AsyncClient( base_url=API_BASE, headers=_headers, timeout=60.0, )