Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
TORCH_HOMENoOverride cache location for PyTorch Hub models (used by phoneme extra).
HF_HUB_CACHENoOverride cache location for Hugging Face models.
MCP_PRONUNCIATION_MODELNoWhisper model size. Default is base.en. Other options: tiny.en, small.en, etc.base.en
MCP_PRONUNCIATION_AUDIO_RETENTIONNoControls whether temporary recordings are kept. Values: 'session' (default) or 'keep'.session

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
converseA

Record the user speaking, transcribe it, and return the transcript plus quick English feedback. This is the primary tool for voice conversations: call it, read the transcript + feedback, then respond conversationally in your own words — weaving the feedback in naturally or mentioning it only if it matters.

Recording auto-stops when the user finishes speaking (silence detection).

Use this tool when:

  • The user wants to chat with you by voice instead of typing

  • The user wants casual English feedback while talking with you

  • You want to hear what the user said rather than read a typed message

For a focused drill where the user reads a specific sentence, use practice instead.

Args: target_hint: Optional. Only set this if the user is explicitly trying to say a specific sentence (e.g. they asked "how do I say X?" and you told them X). Leave blank for free-form conversation. duration: Maximum recording duration in seconds (default 30, max 120). Auto-stops earlier on silence.

Returns: Markdown report containing the user's transcript, brief English feedback (pronunciation + grammar + fluency), and a 'For Claude' section with guidance on how to respond.

practiceA

Drill mode: the user reads a specific sentence aloud and gets a detailed pronunciation assessment. Use this when the user explicitly wants to practice reading a particular sentence, not for free-form chat.

For voice conversation with casual feedback, use converse instead.

Recording auto-stops when the user finishes speaking.

Args: reference_text: The sentence the user will read aloud. duration: Maximum recording duration in seconds (default 15, max 120).

Returns: Detailed pronunciation assessment report.

retryA

Retry the last sentence the user was practicing.

Re-records and re-assesses using the same reference text from the previous practice or converse call. Use this to let the user try again after getting feedback.

Args: duration: Maximum recording duration in seconds (default 15, max 120).

Returns: Pronunciation assessment report for the new attempt.

quick_practiceA

Pick a random practice sentence and drill it immediately.

Combines suggest_sentence + practice into one step: picks a sentence matching the criteria, then records and assesses.

Args: focus: Phoneme focus area. Options: "th", "f_v", "r_l", "vowels", "general". If not specified, picks randomly. difficulty: Difficulty level. Options: "beginner", "intermediate", "advanced". If not specified, picks randomly. duration: Maximum recording duration in seconds (default 15, max 120).

Returns: The sentence to read, followed by the pronunciation assessment.

suggest_sentenceA

Suggest a practice sentence the user can read aloud.

Args: focus: Phoneme focus area. Options: "th", "f_v", "r_l", "vowels", "general". If not specified, picks randomly. difficulty: Difficulty level. Options: "beginner", "intermediate", "advanced". If not specified, picks randomly.

Returns: A practice sentence with its focus area and difficulty.

recordA

Record audio from the microphone without assessing it.

Recording auto-stops when the user finishes speaking (silence detection). The duration is the maximum time — you don't have to wait the full duration.

Most of the time prefer converse or practice, which record AND analyze in one step. Only use record alone if you want the raw WAV file.

Args: duration: Maximum recording duration in seconds (default 10, max 120).

Returns: Path to the recorded WAV file.

assessA

Assess the last recording (or a specific audio file) without re-recording.

When reference_text is provided, the assessor:

  • Aligns the user's speech to the reference word-by-word (Needleman-Wunsch; single deletions/insertions no longer cascade into phantom substitutions).

  • Runs wav2vec2 CTC forced alignment to verify which reference words the user actually produced — mitigates Whisper-bias mistranscriptions on rare proper nouns and domain terms by checking acoustic evidence against the reference directly.

  • Surfaces per-word phoneme-level feedback (expected vs produced IPA, weak phonemes) from CMUdict.

  • Surfaces learner-profile pronunciation hints and drills. The bundled rule pack currently includes Korean-L1 patterns such as r/l, th→s, final cluster deletion, and intrusive onset vowel.

  • Adds prosody notes: word-stress placement, sentence-final rising intonation on declaratives, intra-clause hesitation pauses.

Without a reference, only the transcript and prosody run.

Args: reference_text: Expected text the user was trying to say (optional). audio_path: Path to a WAV file. Uses the last recording if not specified.

Returns: Detailed pronunciation assessment report (markdown).

check_micA

List available audio input devices and verify microphone access.

Use this if the user reports recording problems — it shows which devices are available and which one is the default.

Returns: List of available microphone devices.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JuhongPark/mcp-server-pronunciation'

If you have feedback or need assistance with the MCP directory API, please join our Discord server