Deepgram MCP Server

README.md•4.76 kB

# Deepgram MCP Server A Model Context Protocol (MCP) server that provides access to Deepgram's speech recognition and text-to-speech capabilities. ## Features - **Audio Transcription**: Convert audio to text with high accuracy - **Text-to-Speech**: Generate natural-sounding speech from text with automatic compression - **Audio Analysis**: Extract insights like sentiment, topics, intents, and entities - **Speaker Diarization**: Identify different speakers in audio - **Language Detection**: Automatically detect the language of audio - **Multiple Models**: Support for various Deepgram models optimized for different use cases - **Smart Audio Compression**: Automatically compresses generated audio files for efficient transfer ## Installation 1. Clone this repository 2. Install dependencies: ```bash npm install ``` 3. Copy the environment file and add your Deepgram API key: ```bash cp env.example .env # Edit .env and add your DEEPGRAM_API_KEY, OPENAI_API_KEY or GROQ_API_KEY (whatever you want to use) ``` 4. Build the project: ```bash npm run build ``` ## Usage ### HTTP Transport (Recommended for Production) ```bash npm start # or node dist/index.js ``` The server will start on port 8080 by default. You can specify a different port: ```bash node dist/index.js --port 8081 ``` ### STDIO Transport (For Development) ```bash npm run start:stdio # or node dist/index.js --stdio --port 8081 ``` ## Available Tools ### 1. transcribe_audio Transcribe audio to text with various options for customization. **Parameters:** - `audioUrl` or `audioData`: Audio source (URL or base64) - `model`: Deepgram model to use (default: "nova-2-general") - `language`: Language code (default: "en") - `punctuate`: Add punctuation (default: true) - `diarize`: Speaker identification (default: false) - `sentiment`: Sentiment analysis (default: false) - And many more options... ### 2. text_to_speech Convert text to speech using Deepgram's TTS models with automatic compression. **Parameters:** - `text`: Text to convert to speech (required) - `model`: TTS model to use (default: "aura-asteria-en") - `voice`: Voice selection - `format`: Output format (default: "mp3") - `speed`: Speech speed (default: 1.0) **Output:** - Original audio file saved to `generated_audio/` folder - Compressed audio data saved to `compressed_audio/` folder - Response includes file paths and compression metadata ### 3. analyze_audio Perform advanced audio analysis including sentiment, topics, intents, and entities. **Parameters:** - `audioUrl` or `audioData`: Audio source - `features`: Analysis features to enable - `model`: Model for analysis ### 4. get_models Get information about available Deepgram models. **Parameters:** - `model_type`: Filter by model type ("transcription", "tts", or "all") ## Client Configuration For MCP clients, use this configuration: ```json { "mcpServers": { "deepgram": { "url": "http://localhost:8080/mcp" } } } ``` ## Development ```bash # Watch mode for development npm run watch # Development with STDIO npm run dev:stdio # Development with HTTP npm run dev ``` ## API Key Get your Deepgram API key from [Deepgram Console](https://console.deepgram.com/). ## Audio Compression System The TTS functionality includes an intelligent compression system that: - **Automatically compresses** generated audio files using gzip compression - **Saves compressed data** to separate files to avoid large agent responses - **Provides decompression tools** for easy audio file extraction - **Maintains quality** while reducing file sizes by 2-4x ### File Structure ``` generated_audio/ # Original audio files ├── tts_2025-01-16T...mp3 compressed_audio/ # Compressed audio data ├── compressed_audio_2025-01-16T...json decompressed_audio/ # Decompressed audio files (after extraction) ├── decompressed_2025-01-16T...mp3 ``` ### Decompression Tools **Python Script (Recommended):** ```bash python decompress_audio.py <response_file_or_compressed_file> ``` **Node.js Script:** ```bash npm run decompress <compressed_data_file> ``` ## Agno Integration This MCP server also includes integration with [Agno](https://docs.agno.com/introduction), a high-performance runtime for multi-agent systems. ### Agno Tests ```bash # Text-to-Speech test (saves audio to generated_audio/ and compressed_audio/) npm run test:agno:tts # Speech-to-Text test (transcribes sample audio) npm run test:agno:stt ``` The TTS test will: 1. Generate audio with automatic compression 2. Save the response to `tts_response.json` 3. Decompress the audio file to `generated_audio/` ## License MIT ## Developer - Dheeraj Mudireddy (meetdheerajreddy@gmail.com)

Latest Blog Posts

What Is Context Bloat in MCP?
By Om-Shree-0709 on December 16, 2025.
mcp
Context Bloat
MCP Moves to the Linux Foundation: Neutral Stewardship for Agentic Infrastructure
By Om-Shree-0709 on December 15, 2025.
mcp
anthropic
Linux Foundation
Code Execution with MCP: Architecting Agentic Efficiency
By Om-Shree-0709 on December 14, 2025.
mcp
Token bloat

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/reddheeraj/Deepgram-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server