whisper-telegram-mcp
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| WHISPER_MODEL | No | Whisper model size to use for local inference (e.g., tiny, base, small, medium, large-v3, turbo). | base |
| OPENAI_API_KEY | No | OpenAI API key. Required if using the 'openai' backend. | |
| WHISPER_BACKEND | No | Transcription backend to use: 'auto' (tries local first, falls back to OpenAI), 'local' (faster-whisper), or 'openai' (Whisper API). | auto |
| WHISPER_LANGUAGE | No | ISO-639-1 language code (e.g., 'en'). Defaults to auto-detection if not specified. | |
| TELEGRAM_BOT_TOKEN | No | Telegram Bot API token. Required for the transcribe_telegram_voice tool. |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| transcribe_audioA | Transcribe an audio file to text using Whisper. Supports OGG (Telegram voice), WAV, MP3, FLAC, and most common audio formats. Args: file_path: Absolute path to the audio file to transcribe. language: Optional ISO-639-1 language code (e.g. 'en', 'fr'). None = auto-detect. word_timestamps: If True, include word-level timestamps in segments. Returns: dict with: text, language, language_probability, duration, segments, backend, success, error |
| transcribe_telegram_voiceA | Download and transcribe a Telegram voice message. Downloads the voice message from Telegram, transcribes it, then deletes the temp file. Args: file_id: The file_id from a Telegram voice message (from the Message object). bot_token: Telegram bot token. Falls back to TELEGRAM_BOT_TOKEN env var. language: Optional ISO-639-1 language code. None = auto-detect. word_timestamps: Include word-level timestamps in segments. Returns: Same dict structure as transcribe_audio. |
| list_modelsB | List available Whisper model sizes with performance characteristics. Configure the active model via the WHISPER_MODEL environment variable. Default is 'base' -- a good balance of speed and accuracy for voice messages. |
| check_backendsA | Check which transcription backends are available and configured. Call this first to verify your setup before transcribing. |
| speak_textA | Convert text to speech and return an OGG/Opus audio file path. Plays as a native voice note in Telegram when sent as an attachment. TTS backends (in priority order):
Configure via TTS_BACKEND env var: "auto" | "kokoro" | "openai" | "macos" Args: text: Text to synthesise. voice: Voice name. Kokoro voices: af_sky, af_bella, af_sarah, am_adam, am_michael, bf_emma, bm_george, bm_lewis. OpenAI voices: alloy, echo, fable, onyx, nova, shimmer. Configure default via TTS_VOICE env var. output_path: Optional absolute path for the output .ogg file. Returns: dict with: file_path (absolute .ogg path), backend, voice, success, error |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/abid-mahdi/whisper-telegram-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server