The Minimax MCP Tools server provides AI-powered capabilities through the Model Context Protocol:
- Image Generation: Create high-quality images from text prompts with customizable aspect ratio, number of images, and subject reference images for character consistency.
- Text-to-Speech: Convert text to natural-sounding speech with extensive customization options including voice selection, emotion, speed, volume, pitch, and audio format settings (sample rate, bitrate, channels).
- Advanced Features: Utilize voice mixing (timber weights), LaTeX reading, pronunciation dictionaries, streaming mode, language boosting for improved accuracy, and subtitle generation for accessibility.
- Integration: Works seamlessly with Windsurf and Cursor editors via MCP server configuration.
Supports reading LaTeX formulas in text-to-speech functionality with configurable options for pronunciation.
Required as a runtime environment for the MCP server with version 16 or higher needed as a prerequisite.
Minimax MCP Tools
A Model Context Protocol (MCP) server for Minimax AI integration, providing async image generation and text-to-speech with advanced rate limiting and error handling.
English | 简体中文
MCP Configuration
Add to your MCP settings:
Async Design - Perfect for Content Production at Scale
This MCP server uses an asynchronous submit-and-barrier pattern designed for batch content creation:
🎬 Narrated Slideshow Production - Generate dozens of slide images and corresponding narration in parallel
📚 AI-Driven Audiobook Creation - Produce chapters with multiple voice characters simultaneously
🖼️ Website Asset Generation - Create consistent visual content and audio elements for web projects
🎯 Multimedia Content Pipelines - Perfect for LLM-driven content workflows requiring both visuals and audio
Architecture Benefits:
- Submit Phase: Tools return immediately with task IDs, tasks execute in background
- Smart Rate Limiting: Adaptive rate limiting (10 RPM images, 20 RPM speech) with burst capacity
- Barrier Synchronization:
task_barrier
waits for all tasks and returns comprehensive results - Batch Optimization: Submit multiple tasks to saturate rate limits, then barrier once for maximum throughput
Tools
submit_image_generation
Submit Image Generation Task - Generate images asynchronously.
Required: prompt
, outputFile
Optional: aspectRatio
, customSize
, seed
, subjectReference
, style
submit_speech_generation
Submit Speech Generation Task - Convert text to speech asynchronously.
Required: text
, outputFile
Optional: highQuality
, voiceId
, speed
, volume
, pitch
, emotion
, format
, sampleRate
, bitrate
, languageBoost
, intensity
, timbre
, sound_effects
task_barrier
Wait for Task Completion - Wait for ALL submitted tasks to complete and retrieve results. Essential for batch processing.
Architecture
License
MIT
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
An MCP server implementation that integrates with Minimax API to provide AI-powered image generation and text-to-speech functionality in editors like Windsurf and Cursor.
Related MCP Servers
- -securityAlicense-qualityA MCP server that integrates with Cursor IDE to generate images based on text descriptions using JiMeng AI, allowing users to create and save custom images directly within their development environment.Last updated -195PythonMIT License
- AsecurityFlicenseAqualityAn MCP server that supercharges AI assistants with powerful tools for software development, enabling research, planning, code generation, and project scaffolding through natural language interaction.Last updated -1150268TypeScript
pure.md MCP serverofficial
AsecurityFlicenseAqualityAn MCP server that enables AI clients like Cursor, Windsurf, and Claude Desktop to access web content in markdown format, providing web unblocking and searching capabilities.Last updated -262033JavaScriptMiniMax MCP Serverofficial
AsecurityAlicenseAqualityEnables MCP clients like Claude Desktop and Cursor to interact with MiniMax APIs for generating speech, cloning voices, creating videos, and generating images.Last updated -6896PythonMIT License