Enables AI-powered video-to-audio and text-to-audio generation using MMAudio's API. Create synchronized audio from video content or generate audio from text descriptions with configurable parameters.
An official Model Context Protocol (MCP) server that enables AI clients to interact with ElevenLabs' Text to Speech and audio processing APIs, allowing for speech generation, voice cloning, audio transcription, and other audio-related tasks.
A Model Context Protocol server for macOS that enables AI assistants to play system sounds for audio feedback, offering informational, warning, and error sound options.
Enables text-only models to process images and other media formats by providing access to multimodal models from OpenAI and Dashscope (Alibaba Cloud). Supports flexible deployment options and comprehensive tooling for multimodal AI interactions.
Enables integration with VOICEVOX text-to-speech services to convert text into audio using a variety of character voices. It provides tools for speech generation, listing available speakers, and monitoring system health.