# AudioGen MCP Server
[](https://badge.fury.io/py/audiogen-mcp)
[](https://opensource.org/licenses/MIT)
An MCP server that generates sound effects from text descriptions using Meta's AudioGen model. Designed for Apple Silicon Macs.
## Prerequisites
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.9-3.11 (3.12+ not yet supported by audiocraft)
- ffmpeg: `brew install ffmpeg`
- ~4GB disk space for model weights
- ~8GB RAM recommended
## Installation
Due to audiocraft's complex dependencies (xformers doesn't build on Apple Silicon), installation requires a specific order:
```bash
# Create virtual environment with Python 3.11
uv venv ~/.audiogen-env --python 3.11
source ~/.audiogen-env/bin/activate
# Install audiocraft without its problematic dependencies
uv pip install audiocraft --no-deps
# Install the actual dependencies (skipping xformers)
uv pip install torch torchaudio transformers huggingface_hub encodec einops \
flashy num2words sentencepiece librosa av julius spacy torchmetrics \
hydra-core hydra-colorlog demucs lameenc
# Install audiogen-mcp
uv pip install audiogen-mcp
```
The first run will download the AudioGen model (~2GB).
## Configure Claude Code
```bash
claude mcp add audiogen ~/.audiogen-env/bin/python -- -m audiogen_mcp.server
```
Or add to `~/.config/claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"audiogen": {
"command": "/Users/YOUR_USERNAME/.audiogen-env/bin/python",
"args": ["-m", "audiogen_mcp.server"]
}
}
}
```
## Available Tools
| Tool | Description |
|------|-------------|
| `generate_sound_effect` | Start a background generation job, returns job_id |
| `check_generation_status` | Poll job status by job_id until completed |
| `list_generation_jobs` | List all jobs and their current status |
| `list_generated_sounds` | List previously generated audio files |
| `get_model_status` | Check if model is loaded and device info |
## How It Works
Generation runs in the background to avoid timeouts:
1. Call `generate_sound_effect` with your prompt → returns `job_id`
2. Poll `check_generation_status` with the `job_id` every 10-15 seconds
3. When status is `completed`, the result includes `file_path`
## Example Prompts
Once configured, ask Claude Code to generate sounds:
- "Generate an explosion sound effect"
- "Create a dark ambient tension drone, 10 seconds"
- "Make a retro 8-bit power-up sound, 2 seconds long"
- "Generate footsteps on gravel, 5 seconds"
### Prompt Tips
For best results, be specific:
```
# Good
"glass breaking, single wine glass falling on tile floor"
"8-bit arcade explosion, retro game style"
"dark ambient tension drone, synth pad, ominous low frequency rumble"
# Less good
"glass sound"
"explosion"
"ambient"
```
Include style, mood, and context for better results.
## Performance
- ~18 seconds to generate 1 second of audio on Apple Silicon
- 5 seconds of audio ≈ 90 seconds generation time
- 10 seconds of audio ≈ 180 seconds generation time
- First generation takes longer (model loading ~5s)
- Uses Metal Performance Shaders (MPS) for GPU acceleration
## Output
Generated files save to `~/audiogen_outputs/` by default as WAV or OGG files.
## Troubleshooting
### Installation fails with xformers error
This is expected on Apple Silicon. The server mocks xformers at runtime since it's only needed for CUDA. If audiocraft installation fails, try:
```bash
uv pip install torch torchaudio
uv pip install audiocraft --no-build-isolation
```
### Model download fails
Ensure stable internet and sufficient disk space. The model downloads from HuggingFace Hub.
### Slow generation
Check device with `get_model_status` tool. CPU fallback is 10-20x slower than MPS.
### MPS not available
Requires macOS 12.3+ and PyTorch 2.0+.
## License
MIT License - see [LICENSE](LICENSE) file.
## Acknowledgments
- [Meta AudioCraft](https://github.com/facebookresearch/audiocraft) - The underlying AI model
- [MCP](https://modelcontextprotocol.io/) - Model Context Protocol specification