Puter MCP Server
Provides voice conversion capabilities, allowing conversion of spoken audio to a target voice using ElevenLabs.
Enables text-to-image and image-to-image generation using Google Gemini models.
Supports text-to-image generation (e.g., gpt-image-2), text-to-speech synthesis, and speech-to-text transcription via OpenAI's APIs.
Provides text-to-image generation through Replicate's platform.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Puter MCP ServerGenerate a picture of a cat"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Puter MCP Server
MCP (Model Context Protocol) server for Puter AI media generation. Provides 6 AI-powered tools for image generation, text-to-speech, video generation, OCR, speech-to-text, and voice conversion.
Features
txt2img: Text-to-image generation with multiple providers (OpenAI, Gemini, Together, xAI, Replicate)
txt2speech: Text-to-speech conversion with multiple voices and engines
txt2vid: Text-to-video generation (Sora, Veo, TogetherAI)
img2txt: Image-to-text (OCR) with AWS Textract or Mistral
speech2txt: Speech-to-text transcription
speech2speech: Voice conversion using ElevenLabs
Key Features
Intelligent Default Models: Automatically selects the best model based on task type
Text-to-image:
gpt-image-2(OpenAI)Image-to-image:
gemini-2.5-flash-image-preview(Gemini)
Multiple Providers: Support for OpenAI, Google Gemini, xAI (Grok), Replicate, Together AI, ElevenLabs
Flexible Output: Supports base64 and URL output formats
Test Mode: Built-in test mode for development without consuming credits
Quick Start
Prerequisites
Node.js 18+
Puter API Key (get from puter.com)
Installation
# Clone the repository
git clone https://github.com/your-username/puter-mcp.git
cd puter-mcp
# Install dependencies
npm install
# Build the project
npm run buildConfiguration
Copy the environment file:
cp .env.example .envEdit
.envand add your Puter API key:
PUTER_API_KEY=your_puter_api_key_hereUsage
Claude Desktop / Trae
Add the following to your Claude Desktop or Trae configuration file:
Windows:
%APPDATA%\Trae\mcp_settings.jsonmacOS:
~/Library/Application Support/Trae/mcp_settings.jsonLinux:
~/.config/Trae/mcp_settings.jsonConfiguration content:
{
"mcpServers": {
"puter-mcp": {
"command": "node",
"args": ["path/to/puter-mcp/dist/index.js"],
"env": {
"PUTER_API_KEY": "your_api_key"
}
}
}
}Command Line
# Stdio mode (default)
npm start
# SSE mode
TRANSPORT=sse PORT=3000 npm startTools Reference
txt2img
Generate images from text prompts. Supports both text-to-image and image-to-image.
Parameter | Type | Description |
| string | Text description for the image |
| string | Model to use (default: gpt-image-2 for text-to-image, gemini-2.5-flash-image-preview for image-to-image) |
| string | AI provider (openai-image-generation, gemini, together, xai, replicate-image-generation) |
| string | Image quality (high, medium, low, hd, standard) |
| object | Aspect ratio {w, h} |
| string | Input image for image-to-image (Base64 or URL) |
| boolean | Test mode without credits |
| string | Output format (base64, url) |
Example:
Generate a picture of a cattxt2speech
Convert text to speech.
Parameter | Type | Description |
| string | Text to convert |
| string | TTS provider (aws-polly, openai, elevenlabs, gemini, xai) |
| string | TTS model |
| string | Voice ID |
| string | Synthesis engine (standard, neural, long-form, generative) |
| string | Language code |
| boolean | Test mode |
Example:
Convert "Hello world" to speechtxt2vid
Generate videos from text prompts.
Parameter | Type | Description |
| string | Video description |
| string | Video model (sora-2, veo-3.1-generate-preview, etc.) |
| number | Video duration (4, 8, 12) |
| string | Resolution (e.g., 1280x720) |
| boolean | Test mode |
Example:
Generate a video of a drone flying over mountainsimg2txt
Extract text from images (OCR).
Parameter | Type | Description |
| string | Image URL, Base64, or Puter path |
| string | OCR provider (aws-textract, mistral) |
| boolean | Test mode |
Example:
Extract text from this image: https://example.com/document.pngspeech2txt
Convert speech to text.
Parameter | Type | Description |
| string | Audio URL, Base64, or Puter path |
| string | STT provider (openai, xai) |
| string | Model name |
| string | Language code |
| boolean | Translate to English |
| boolean | Test mode |
Example:
Transcribe this audio: https://example.com/speech.mp3speech2speech
Convert voice to another voice using ElevenLabs.
Parameter | Type | Description |
| string | Input audio URL, Base64, or Puter path |
| string | Target ElevenLabs voice ID |
| string | Voice model (default: eleven_multilingual_sts_v2) |
| string | Output format |
| boolean | Test mode |
Example:
Convert this voice to a different voice: https://example.com/speech.mp3Development
Project Structure
puter-mcp/
├── src/
│ ├── index.ts # Server entry point
│ ├── client.ts # Puter SDK initialization
│ ├── utils.ts # Response formatting utilities
│ ├── puter.d.ts # TypeScript declarations
│ └── tools/
│ ├── index.ts # Tool registration
│ ├── txt2img.ts
│ ├── txt2speech.ts
│ ├── txt2vid.ts
│ ├── img2txt.ts
│ ├── speech2txt.ts
│ └── speech2speech.ts
├── scripts/
│ └── verify-responses.ts # SDK response verification
├── dist/ # Compiled output
├── package.json
└── tsconfig.jsonBuild
npm run buildType Check
npm run typecheckDevelopment Mode
npm run devLicense
MIT License - see LICENSE for details.
Acknowledgments
Support
Issue Tracker: https://github.com/your-username/puter-mcp/issues
Documentation: https://docs.puter.com/AI/
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/tiantian-pago/puter_mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server