low-latency-tts-api-server-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@low-latency-tts-api-server-mcpsay 'Hello, how can I help you today?'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
low-latency-tts-service-mcp
Low-latency local text-to-speech powered by Kokoro through TTS.cpp. The service shells out to a local tts-cli binary for fast Kokoro GGUF inference, then handles queued playback, status tracking, and MCP integration for AI agents. It offers a FastAPI server for HTTP clients, a TypeScript MCP relay for Claude Code, Claude Desktop, or any MCP-compatible client, and an interactive terminal chat REPL (just chat) for typing text and hearing it spoken right away.
Features
Feature | Description |
Queued Playback | Sequential speech playback through a background worker, with request status tracking |
REST API Server | FastAPI server with |
MCP Server | Ready-to-use MCP bridge that exposes |
Interactive Chat REPL | Terminal REPL ( |
TTS.cpp Runtime | Uses the local TTS.cpp |
Kokoro Voices | 27 English no-espeak Kokoro voices, including |
WAV Output | Generated audio is saved as timestamped WAV files under |
Explicit Runtime Config | TTS.cpp binary, GGUF model path, sampling parameters, host, port, and playback settings are read from |
Under the hood, the project shells out to a local TTS.cpp tts-cli binary for Kokoro generation, uses sounddevice for audio output, and uses FastAPI for the HTTP server. The MCP server is a lightweight TypeScript stdio-to-HTTP relay using the Model Context Protocol SDK.
Design Principles
All runtime configuration is explicit. If a required value is missing from config.yaml, or if the configured tts-cli executable or Kokoro GGUF model is not present, the service fails immediately with a clear error. The service does not silently fall back to another model, voice, port, binary, or output directory.
Audio files are written to data/output/ as WAV files with timestamps when save_wav: true in config.yaml. The server serializes requests through a queue and a background audio worker. The worker generates audio for a request, starts playback, and can generate the next queued request while the current one is playing.
Related MCP server: VOICEVOX TTS MCP
Architecture
There are three entry paths into the system. HTTP clients call the FastAPI server directly. AI agents call the MCP relay (mcp/tts-mcp.ts), which reads the server host and port from config.yaml, checks /health, and forwards tool calls to the FastAPI API. Terminal users run the interactive chat REPL (src/main.py), which drives the shared TTS runtime directly without going through the HTTP server. In every case, TTS inference runs out-of-process through the configured TTS.cpp tts-cli, using the Kokoro no-espeak GGUF model.
┌─────────────────────────┐ ┌────────────────────┐ ┌─────────────────────────┐
│ AI Agent │ │ HTTP Client │ │ Terminal User │
│ Claude Code / Desktop │ │ curl / scripts │ │ just chat │
└────────────┬────────────┘ └─────────┬──────────┘ └────────────┬────────────┘
│ MCP stdio │ HTTP │ keystrokes
▼ │ ▼
┌─────────────────────────┐ │ ┌─────────────────────────┐
│ MCP Server (Node.js) │ │ │ src/main.py │
│ mcp/tts-mcp.ts │ │ │ interactive chat REPL │
│ tools: say, get_voices, │ │ │ type text -> synth+play │
│ get_status │ │ └────────────┬────────────┘
└────────────┬────────────┘ │ │
│ HTTP │ │
▼ ▼ │
┌───────────────────────────────────────────────────────┐ │
│ FastAPI Server │ │
│ src/low_latency_tts_service_mcp/server.py │ │
│ │ │
│ POST /say GET /voices /status/{id} /health │ │
│ work queue -> audio worker -> status map │ │
└───────────────────┬───────────────────────────────────┘ │
│ │
▼ ▼
┌────────────────────────────────────────────────────────────────────────────────────┐
│ Shared TTS Runtime - src/low_latency_tts_service_mcp/tts.py │
│ │
│ text cleanup -> TTS.cpp command -> WAV reader -> sounddevice playback │
└───────────────────┬───────────────────────────────────┬────────────────────────────┘
│ │
▼ ▼
┌───────────────────────┐ ┌───────────────────────┐
│ TTS.cpp tts-cli │ │ data/output/*.wav │
│ Kokoro_no_espeak.gguf │ │ timestamped audio │
└───────────┬───────────┘ └───────────────────────┘
│
▼
┌───────────┐
│ Speakers │
└───────────┘Prerequisites
Python 3.12+
uv - Python package manager (install)
just - Command runner (install)
Node.js 18+ - For the MCP server
TTS.cpp
tts-cli- A local executable referenced byconfig.yamlLocal audio output device - Required for playback through
sounddevice
Project Structure
.
├── src/
│ ├── main.py # Interactive chat REPL (just chat / just run)
│ ├── server.py # Module wrapper for uv run -m src.server
│ └── low_latency_tts_service_mcp/
│ ├── server.py # FastAPI server, queue, statuses, worker
│ └── tts.py # Config, Kokoro command, WAV playback
├── tests/
│ ├── test_main.py
│ ├── test_server.py
│ ├── test_tts.py
│ ├── test_integration.py
│ └── architecture/ # Architecture import rule tests
├── scripts/
│ └── download-model.sh # Interactive Kokoro model downloader
├── mcp/
│ ├── tts-mcp.ts # MCP relay to FastAPI server
│ ├── package.json
│ └── tsconfig.json
├── config/
│ ├── semgrep/ # Static analysis rules
│ └── codespell/ # Spell-check configuration
├── data/
│ ├── models/ # Downloaded Kokoro GGUF model
│ └── output/ # Generated WAV files
├── stubs/ # Local type stubs for strict checking
├── vendor/
│ └── TTS.cpp/ # Local TTS.cpp checkout/build location
├── config.yaml # Runtime configuration
├── justfile # Command recipes
└── pyproject.toml # Project metadata and dependenciesSetup
just initCreates report directories and installs Python dependencies via uv sync --all-extras. If no Kokoro model is present and the command is running interactively, just init prompts for a model download. In non-interactive contexts it tells you to run just download.
Download a Model
just downloadDownloads the Kokoro no-espeak GGUF model used by the default configuration:
Model | Size | Notes |
| ~354 MB | English no-espeak Kokoro model with 27 built-in voices |
The default path is:
data/models/Kokoro_no_espeak.ggufGetting Started
Run
just init- installs Python dependenciesRun
just download- downloadsKokoro_no_espeak.ggufif it is not already presentConfirm
config.yamlpoints at the localtts-cliexecutable and downloaded modelRun
just start- starts the FastAPI TTS serverSend requests via HTTP or through the MCP bridge
To try synthesis without the server, run just chat for an interactive terminal REPL: type a line, press Enter twice, and hear it spoken.
Configuration
TTS and server runtime settings live in config.yaml at the project root. Some operational constants, such as status retention and MCP request timeouts, are defined in code. Example:
tts_cli: ./vendor/TTS.cpp/build/bin/tts-cli
model: ./data/models/Kokoro_no_espeak.gguf
output_dir: ./data/output
sample_rate: 24000
lead_silence_ms: 200
default_voice: af_heart
save_wav: true
simplify_punctuation: false
n_threads: 8
timeout_seconds: 120
temperature: 1.0
topk: 50
repetition_penalty: 1.0
top_p: 1.0
host: 0.0.0.0
port: 12000Key | Description |
| Path to the local TTS.cpp |
| Path to |
| Directory for generated WAV files |
| Expected WAV sample rate in Hz |
| Silence written before playback starts on a new audio stream |
| Voice used when |
| Save generated audio to WAV files in |
| Simplify punctuation before synthesis ( |
| Number of threads passed to TTS.cpp |
| Maximum duration for one TTS.cpp generation command |
| Kokoro sampling temperature |
| Kokoro top-k sampling value |
| Kokoro repetition penalty |
| Kokoro top-p sampling value |
| Server listen address |
| Server listen port |
Voices
The no-espeak Kokoro model exposes these voice identifiers:
af_alloy, af_aoede, af_bella, af_heart, af_jessica, af_kore, af_nicole,
af_nova, af_river, af_sarah, af_sky, am_adam, am_echo, am_eric,
am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa, bf_alice,
bf_emma, bf_isabella, bf_lily, bm_daniel, bm_fable, bm_georgeUsage
Command | Description |
| Start the interactive chat REPL (type text, hear speech) |
| Alias for |
| Start the FastAPI TTS server in the foreground |
| Stop the running server |
| Check if the server is running |
| Install Node dependencies for the MCP relay |
| Start the MCP stdio relay from the terminal |
| Type-check the MCP TypeScript relay |
Interactive Chat (REPL)
just chatStarts an interactive terminal REPL that synthesizes and plays each submission with the local TTS.cpp tts-cli. Generation for the next line overlaps playback of the current one, so there is no gap between utterances. The REPL drives the shared TTS runtime directly and does not require the FastAPI server to be running.
If --voice is not supplied, the REPL prompts you to pick a voice; otherwise it uses the one you pass. It reads settings (model, sampling parameters, sample rate, save_wav, and more) from config.yaml and fails immediately if the configured tts-cli or GGUF model is missing.
Input controls:
Key | Action |
| Insert a newline into the current line |
| Submit the buffered text for synthesis |
| Quit |
| Quit |
| Delete the previous character |
Run it directly with uv run for more options:
# Pick a voice interactively, then type lines to speak
uv run -m src.main
# Skip voice selection
uv run -m src.main --voice af_heart
# One-shot: synthesize a single string and exit
uv run -m src.main --voice am_adam "Hello from Kokoro."
# List previously generated WAV files in data/output/ and exit
uv run -m src.main --list-outputsWhen save_wav: true, each utterance is written to a timestamped WAV under output_dir; when false, audio is played and the temporary file is removed.
Server
just startStarts a FastAPI server with queued playback. The server validates config.yaml at startup and processes requests sequentially through a background worker.
API
FastAPI auto-generates interactive docs at /docs (Swagger) and /redoc (ReDoc) when the server is running.
Method | Endpoint | Description |
GET |
| Liveness check |
GET |
| List available voices and default voice |
POST |
| Queue text for synthesis and playback |
GET |
| Check status of a queued, generating, playing, completed, or failed message |
POST /say
{
"text": "Hello, this is a Kokoro TTS request.",
"voice": "af_heart"
}Returns 202 Accepted with a message ID and queue position:
{
"message_id": "msg_20260627_130430_001",
"status": "queued",
"queue_position": 0
}Audio plays through the server machine's speakers.
Message Lifecycle
queued -> generating -> playing -> completedFailures are reported as:
errorCompleted and failed statuses are evicted lazily after 1 hour when later /say or /status requests trigger status cleanup.
MCP Server
The MCP server (mcp/tts-mcp.ts) is a transparent relay between MCP clients and the FastAPI server. It exposes three tools:
Tool | Description |
| Queue text for speech synthesis with a specified voice |
| List all available voices |
| Check status of a speech request by message ID |
Setup
just mcp-installUsage with Claude Code / Claude Desktop
Start the FastAPI server first:
just startThen configure the MCP client to run the TypeScript relay directly. For Claude Code, from the project directory:
claude mcp add --scope local kokoro-tts-project \
-e KOKORO_TTS_CONFIG_PATH=/path/to/low-latency-tts-service-mcp/config.yaml \
-- /path/to/low-latency-tts-service-mcp/mcp/node_modules/.bin/tsx \
/path/to/low-latency-tts-service-mcp/mcp/tts-mcp.tsFor JSON-based MCP configuration:
{
"mcpServers": {
"kokoro-tts-project": {
"command": "/path/to/low-latency-tts-service-mcp/mcp/node_modules/.bin/tsx",
"args": ["/path/to/low-latency-tts-service-mcp/mcp/tts-mcp.ts"],
"env": {
"KOKORO_TTS_CONFIG_PATH": "/path/to/low-latency-tts-service-mcp/config.yaml"
}
}
}
}The MCP relay reads host and port from config.yaml and calls /health before tool requests. Successful FastAPI JSON responses are returned as MCP text content; health check failures and non-OK HTTP responses are wrapped as structured MCP error results.
Development
Code Quality
Command | Description |
| Auto-fix code style and formatting |
| Check code style and formatting (read-only) |
| Run static type checking with mypy |
| Run strict type checking with Pyright (LSP-based) |
| Run security checks with bandit |
| Check dependency hygiene with deptry |
| Check spelling in code and documentation |
| Run Semgrep static analysis |
| Scan dependencies for known vulnerabilities |
| Run architecture import rule tests |
| Generate code statistics with pygount |
Testing
Command | Description |
| Run unit and integration tests |
| Run tests with coverage report |
CI
just ci- Run all validation checks (verbose)just ci-quiet- Run all checks (silent, fail-fast)
The CI pipeline runs in order: init, code-format, code-style, code-typecheck, code-security, code-deptry, code-spell, code-semgrep, code-audit, test, code-architecture, code-lspchecks, and mcp-typecheck.
AI-Assisted Development
This project includes an AGENTS.md file with development rules for AI coding assistants.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/florianbuetow/low-latency-tts-api-server-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server