Schema | mcp-server-pronunciation

mcp-server-pronunciation

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`TORCH_HOME`	No	Override cache location for PyTorch Hub models (used by phoneme extra).
`HF_HUB_CACHE`	No	Override cache location for Hugging Face models.
`MCP_PRONUNCIATION_MODEL`	No	Whisper model size. Default is base.en. Other options: tiny.en, small.en, etc.	base.en
`MCP_PRONUNCIATION_AUDIO_RETENTION`	No	Controls whether temporary recordings are kept. Values: 'session' (default) or 'keep'.	session

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": false }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
converseA	Record the user speaking, transcribe it, and return the transcript plus quick English feedback. This is the primary tool for voice conversations: call it, read the transcript + feedback, then respond conversationally in your own words — weaving the feedback in naturally or mentioning it only if it matters. Recording auto-stops when the user finishes speaking (silence detection). Use this tool when: The user wants to chat with you by voice instead of typing The user wants casual English feedback while talking with you You want to hear what the user said rather than read a typed message For a focused drill where the user reads a specific sentence, use `practice` instead. Args: target_hint: Optional. Only set this if the user is explicitly trying to say a specific sentence (e.g. they asked "how do I say X?" and you told them X). Leave blank for free-form conversation. duration: Maximum recording duration in seconds (default 30, max 120). Auto-stops earlier on silence. Returns: Markdown report containing the user's transcript, brief English feedback (pronunciation + grammar + fluency), and a 'For Claude' section with guidance on how to respond.
practiceA	Drill mode: the user reads a specific sentence aloud and gets a detailed pronunciation assessment. Use this when the user explicitly wants to practice reading a particular sentence, not for free-form chat. For voice conversation with casual feedback, use `converse` instead. Recording auto-stops when the user finishes speaking. Args: reference_text: The sentence the user will read aloud. duration: Maximum recording duration in seconds (default 15, max 120). Returns: Detailed pronunciation assessment report.
retryA	Retry the last sentence the user was practicing. Re-records and re-assesses using the same reference text from the previous `practice` or `converse` call. Use this to let the user try again after getting feedback. Args: duration: Maximum recording duration in seconds (default 15, max 120). Returns: Pronunciation assessment report for the new attempt.
quick_practiceA	Pick a random practice sentence and drill it immediately. Combines `suggest_sentence` + `practice` into one step: picks a sentence matching the criteria, then records and assesses. Args: focus: Phoneme focus area. Options: "th", "f_v", "r_l", "vowels", "general". If not specified, picks randomly. difficulty: Difficulty level. Options: "beginner", "intermediate", "advanced". If not specified, picks randomly. duration: Maximum recording duration in seconds (default 15, max 120). Returns: The sentence to read, followed by the pronunciation assessment.
suggest_sentenceA	Suggest a practice sentence the user can read aloud. Args: focus: Phoneme focus area. Options: "th", "f_v", "r_l", "vowels", "general". If not specified, picks randomly. difficulty: Difficulty level. Options: "beginner", "intermediate", "advanced". If not specified, picks randomly. Returns: A practice sentence with its focus area and difficulty.
recordA	Record audio from the microphone without assessing it. Recording auto-stops when the user finishes speaking (silence detection). The duration is the maximum time — you don't have to wait the full duration. Most of the time prefer `converse` or `practice`, which record AND analyze in one step. Only use `record` alone if you want the raw WAV file. Args: duration: Maximum recording duration in seconds (default 10, max 120). Returns: Path to the recorded WAV file.
assessA	Assess the last recording (or a specific audio file) without re-recording. When `reference_text` is provided, the assessor: Aligns the user's speech to the reference word-by-word (Needleman-Wunsch; single deletions/insertions no longer cascade into phantom substitutions). Runs wav2vec2 CTC forced alignment to verify which reference words the user actually produced — mitigates Whisper-bias mistranscriptions on rare proper nouns and domain terms by checking acoustic evidence against the reference directly. Surfaces per-word phoneme-level feedback (expected vs produced IPA, weak phonemes) from CMUdict. Surfaces learner-profile pronunciation hints and drills. The bundled rule pack currently includes Korean-L1 patterns such as r/l, th→s, final cluster deletion, and intrusive onset vowel. Adds prosody notes: word-stress placement, sentence-final rising intonation on declaratives, intra-clause hesitation pauses. Without a reference, only the transcript and prosody run. Args: reference_text: Expected text the user was trying to say (optional). audio_path: Path to a WAV file. Uses the last recording if not specified. Returns: Detailed pronunciation assessment report (markdown).
check_micA	List available audio input devices and verify microphone access. Use this if the user reports recording problems — it shows which devices are available and which one is the default. Returns: List of available microphone devices.

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JuhongPark/mcp-server-pronunciation'

If you have feedback or need assistance with the MCP directory API, please join our Discord server