speak
Convert a klattsch phoneme string into synthesized speech audio. Returns base64-encoded WAV using retro-style formant synthesis.
Instructions
Synthesize speech from a klattsch phoneme string. Returns base64 WAV audio.
What This Is
klattsch is a formant speech synthesizer (late-70s/early-80s style — think Votrax, SAM). You give it a string of ARPAbet phoneme codes with optional voice control directives, and it renders a WAV audio file.
How To Use This Tool
Step 1: Build a phoneme string
Write ARPAbet phonemes separated by spaces, with control directives mixed in. Use the text_to_phonemes tool first to convert English text, then refine by hand.
Step 2 (optional): Set voice character
Prefix your utterance with control directives to set the voice:
bN: base pitch in Hz (b120 = default male, b200 = female, b280 = child)
rN: per-phoneme duration in ms (r80 = fast, r110 = normal, r250+ = sung)
sN: formant scale (1.0 = male, 1.17 = female, 1.3 = child)
vN: vibrato depth in Hz (v3-v6 = expressive, v0 = off)
hN: breathiness 0..1 (h0.3 = airy/whispery)
gN: vocal effort 0=lax..1=tense (default 0.5)
tN: spectral tilt -0.9=darker..+0.9=brighter (t-0.4 = warm, t0.3 = bright)
Step 3: Add prosody (intonation)
! after a vowel for stress: DH AE! T = "THAT" with emphasis
+N/-N on vowels for pitch changes: AY+20 = rising "I", D AH N(-30) = falling "done"
(+N)/(-N) for transient ornaments (don't carry forward)
, ; . for pauses: 100ms, 200ms, 300ms
Step 4: Render
Pass the complete string to this tool.
Quick-Reference Voice Presets
Preset | Directives | Description |
Male natural | b120 r100 s1.0 v2 | Default voice |
Male deep | b90 r95 s0.92 v1 t-0.3 g0.6 | Deep, authoritative |
Male bright | b130 r105 s1.0 v2 t0.2 | Clear, energetic |
Female natural | b200 r100 s1.17 v2 | Natural female |
Female warm | b185 r105 s1.15 v3 t-0.2 | Warm, friendly |
Female bright | b220 r100 s1.18 v2 t0.2 | Bright, cheery |
Child | b280 r90 s1.3 v1 | Young, higher pitch |
Robot | b120 r90 s1.0 v0 h0 g0.8 t0.5 | Flat, mechanical |
Whisper | b120 r100 s1.0 v0 h0.6 g0.1 | Breathy whisper |
Dramatic | b100 r130 s1.0 v5 | Slow, theatrical |
Singing male | bC4 r300 s1.0 v5 | For sung notes |
Singing female | bG4 r300 s1.17 v4 | For sung notes |
Intonation Patterns That Sound Natural
Falling statement (period): last vowel gets -20 to -30 e.g. D AH N(-25) Rising question: last vowel gets +20 to +30 e.g. R EH D IY(+25) Listing items: each item rises, last falls e.g. AE(+15) P AH L Z(+15) AO R AH N JH(-20) Excited: higher base pitch, faster b140 r85 ... Serious/deep: lower base pitch, slower b95 r115 ... Sarcastic: exaggerated pitch swings AY+30 M . S OW(-30) . S AH R K AE S T IH K
Singing With Note Names
Instead of Hz for b, use note names: bC4, bD#4, bEb4, bF4, bG4, bA4, bB4 Middle C = C4 (261Hz), A4 = 440Hz Set r250-r400 per phoneme, group notes with parentheses: bC4 r300 ( HH AH ) ( L OW ) bE4 ( W ER L D )
Example Strings
"Hello world" (male): b120 r100 s1.0 HH AH L OW . W ER L D
"How are you?" (female, rising): b200 s1.17 HH AW . AA R . Y UW(+25)
"I am NOT impressed" (stress on NOT): b120 AY . AE M . N AO T! . IH M P R EH S T(-20)
"The quick brown fox" (energetic): b135 r90 t0.2 DH AH . K W IH K . B R AW N . F AA K S
Sing "Twinkle twinkle" (two notes): bC4 r300 ( T W IH NG ) ( K AH L ) bG4 r300 ( T W IH NG ) ( K AH L )
Dramatic movie trailer voice: b95 r140 s0.95 v4 t-0.3 g0.7 IH N . AH . W ER L D(-25) .
Robot announcement: b130 r85 s1.0 v0 h0 g0.8 t0.4 AH T EH N SH AH N . P L IY Z
Whispered secret: b110 r105 v0 h0.5 g0.1 s1.0 P S T . D OW N T . T EH L . EH N IY W AH N
Phoneme Categories (all 39 phonemes)
Vowels: IY IH EH AE AA AO AH UH UW ER AY AW EY OW OY Sonorants: W Y R L M N NG Fricatives: F TH S SH V DH Z ZH HH Stops: P B T D K G (these get automatic burst + silence) Affricates: CH JH
⚠️ P, B, T, D, K, G, CH, JH are stop consonants — they include an automatic silence-burst pattern. Don't add extra pauses after them.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| utterance | Yes | The klattsch phoneme string. ARPAbet codes + control directives, whitespace-separated. Use text_to_phonemes to convert English first, then tweak. | |
| sampleRate | No |