assess
Evaluates a speech recording against a reference text, providing word-level phoneme feedback, prosody analysis, and alignment. Without reference, returns transcript and prosody.
Instructions
Assess the last recording (or a specific audio file) without re-recording.
When reference_text is provided, the assessor:
Aligns the user's speech to the reference word-by-word (Needleman-Wunsch; single deletions/insertions no longer cascade into phantom substitutions).
Runs wav2vec2 CTC forced alignment to verify which reference words the user actually produced — mitigates Whisper-bias mistranscriptions on rare proper nouns and domain terms by checking acoustic evidence against the reference directly.
Surfaces per-word phoneme-level feedback (expected vs produced IPA, weak phonemes) from CMUdict.
Surfaces learner-profile pronunciation hints and drills. The bundled rule pack currently includes Korean-L1 patterns such as r/l, th→s, final cluster deletion, and intrusive onset vowel.
Adds prosody notes: word-stress placement, sentence-final rising intonation on declaratives, intra-clause hesitation pauses.
Without a reference, only the transcript and prosody run.
Args: reference_text: Expected text the user was trying to say (optional). audio_path: Path to a WAV file. Uses the last recording if not specified.
Returns: Detailed pronunciation assessment report (markdown).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| reference_text | No | ||
| audio_path | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |