mcp-server-pronunciation
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| TORCH_HOME | No | Override cache location for PyTorch Hub models (used by phoneme extra). | |
| HF_HUB_CACHE | No | Override cache location for Hugging Face models. | |
| MCP_PRONUNCIATION_MODEL | No | Whisper model size. Default is base.en. Other options: tiny.en, small.en, etc. | base.en |
| MCP_PRONUNCIATION_AUDIO_RETENTION | No | Controls whether temporary recordings are kept. Values: 'session' (default) or 'keep'. | session |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| converseA | Record the user speaking, transcribe it, and return the transcript plus quick English feedback. This is the primary tool for voice conversations: call it, read the transcript + feedback, then respond conversationally in your own words — weaving the feedback in naturally or mentioning it only if it matters. Recording auto-stops when the user finishes speaking (silence detection). Use this tool when:
For a focused drill where the user reads a specific sentence, use Args: target_hint: Optional. Only set this if the user is explicitly trying to say a specific sentence (e.g. they asked "how do I say X?" and you told them X). Leave blank for free-form conversation. duration: Maximum recording duration in seconds (default 30, max 120). Auto-stops earlier on silence. Returns: Markdown report containing the user's transcript, brief English feedback (pronunciation + grammar + fluency), and a 'For Claude' section with guidance on how to respond. |
| practiceA | Drill mode: the user reads a specific sentence aloud and gets a detailed pronunciation assessment. Use this when the user explicitly wants to practice reading a particular sentence, not for free-form chat. For voice conversation with casual feedback, use Recording auto-stops when the user finishes speaking. Args: reference_text: The sentence the user will read aloud. duration: Maximum recording duration in seconds (default 15, max 120). Returns: Detailed pronunciation assessment report. |
| retryA | Retry the last sentence the user was practicing. Re-records and re-assesses using the same reference text from the previous
Args: duration: Maximum recording duration in seconds (default 15, max 120). Returns: Pronunciation assessment report for the new attempt. |
| quick_practiceA | Pick a random practice sentence and drill it immediately. Combines Args: focus: Phoneme focus area. Options: "th", "f_v", "r_l", "vowels", "general". If not specified, picks randomly. difficulty: Difficulty level. Options: "beginner", "intermediate", "advanced". If not specified, picks randomly. duration: Maximum recording duration in seconds (default 15, max 120). Returns: The sentence to read, followed by the pronunciation assessment. |
| suggest_sentenceA | Suggest a practice sentence the user can read aloud. Args: focus: Phoneme focus area. Options: "th", "f_v", "r_l", "vowels", "general". If not specified, picks randomly. difficulty: Difficulty level. Options: "beginner", "intermediate", "advanced". If not specified, picks randomly. Returns: A practice sentence with its focus area and difficulty. |
| recordA | Record audio from the microphone without assessing it. Recording auto-stops when the user finishes speaking (silence detection). The duration is the maximum time — you don't have to wait the full duration. Most of the time prefer Args: duration: Maximum recording duration in seconds (default 10, max 120). Returns: Path to the recorded WAV file. |
| assessA | Assess the last recording (or a specific audio file) without re-recording. When
Without a reference, only the transcript and prosody run. Args: reference_text: Expected text the user was trying to say (optional). audio_path: Path to a WAV file. Uses the last recording if not specified. Returns: Detailed pronunciation assessment report (markdown). |
| check_micA | List available audio input devices and verify microphone access. Use this if the user reports recording problems — it shows which devices are available and which one is the default. Returns: List of available microphone devices. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/JuhongPark/mcp-server-pronunciation'
If you have feedback or need assistance with the MCP directory API, please join our Discord server