Skip to main content
Glama

assess_pronunciation

Analyze pronunciation accuracy by comparing spoken audio to reference text. Provides detailed scores and phoneme-level feedback to identify areas for improvement.

Instructions

Assess how accurately a speaker pronounced the given text.

Returns an overall pronunciation score (0-100), per-word scores, and phoneme-level feedback including accuracy, fluency, and completeness.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
textYesThe reference text the user should have read aloud
audio_base64YesBase64-encoded audio of the user reading the text (WAV or MP3)
languageNoLanguage code, e.g. 'en-US', 'pt-BR', 'es-ES'en-US

Implementation Reference

  • server.py:51-71 (handler)
    The main handler function for the 'assess_pronunciation' tool. It takes text, base64-encoded audio, and language code as inputs, makes an async POST request to the Brainiall API endpoint '/v1/pronunciation/assess', and returns a JSON response with pronunciation scores including overall score (0-100), per-word scores, and phoneme-level feedback.
    async def assess_pronunciation(
        text: Annotated[str, "The reference text the user should have read aloud"],
        audio_base64: Annotated[str, "Base64-encoded audio of the user reading the text (WAV or MP3)"],
        language: Annotated[str, "Language code, e.g. 'en-US', 'pt-BR', 'es-ES'"] = "en-US",
    ) -> dict:
        """Assess how accurately a speaker pronounced the given text.
    
        Returns an overall pronunciation score (0-100), per-word scores,
        and phoneme-level feedback including accuracy, fluency, and completeness.
        """
        async with _client() as client:
            response = await client.post(
                "/v1/pronunciation/assess",
                json={
                    "text": text,
                    "audio_base64": audio_base64,
                    "language": language,
                },
            )
            response.raise_for_status()
            return response.json()
  • Input schema definition using Annotated type hints. Defines three parameters: 'text' (required reference text), 'audio_base64' (required base64-encoded audio in WAV or MP3 format), and 'language' (optional with default 'en-US').
        text: Annotated[str, "The reference text the user should have read aloud"],
        audio_base64: Annotated[str, "Base64-encoded audio of the user reading the text (WAV or MP3)"],
        language: Annotated[str, "Language code, e.g. 'en-US', 'pt-BR', 'es-ES'"] = "en-US",
    ) -> dict:
  • server.py:50-50 (registration)
    Tool registration using the @mcp.tool() decorator from FastMCP framework, which registers the function as an MCP tool named 'assess_pronunciation'.
    @mcp.tool()
  • Helper function _client() that creates and returns an httpx.AsyncClient configured with the API base URL, authorization headers, and 60-second timeout for making HTTP requests to the Brainiall API.
    def _client() -> httpx.AsyncClient:
        return httpx.AsyncClient(
            base_url=API_BASE,
            headers=_headers,
            timeout=60.0,
        )

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/fasuizu-br/brainiall-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server