Which integrations are available for this server?

Provides voice conversion capabilities, allowing conversion of spoken audio to a target voice using ElevenLabs. Enables text-to-image and image-to-image generation using Google Gemini models. Supports text-to-image generation (e.g., gpt-image-2), text-to-speech synthesis, and speech-to-text transcription via OpenAI's APIs. Provides text-to-image generation through Replicate's platform.

How do I use Puter MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Puter MCP Server Generate a picture of a cat" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Puter MCP Server

by tiantian-pago

Overview Schema Related Servers Score Discussions

TypeScript

Remote

Puter MCP Server

English | 中文

MCP (Model Context Protocol) server for Puter AI media generation. Provides 6 AI-powered tools for image generation, text-to-speech, video generation, OCR, speech-to-text, and voice conversion.

Features

txt2img: Text-to-image generation with multiple providers (OpenAI, Gemini, Together, xAI, Replicate)
txt2speech: Text-to-speech conversion with multiple voices and engines
txt2vid: Text-to-video generation (Sora, Veo, TogetherAI)
img2txt: Image-to-text (OCR) with AWS Textract or Mistral
speech2txt: Speech-to-text transcription
speech2speech: Voice conversion using ElevenLabs

Key Features

Intelligent Default Models: Automatically selects the best model based on task type
- Text-to-image: gpt-image-2 (OpenAI)
- Image-to-image: gemini-2.5-flash-image-preview (Gemini)
Multiple Providers: Support for OpenAI, Google Gemini, xAI (Grok), Replicate, Together AI, ElevenLabs
Flexible Output: Supports base64 and URL output formats
Test Mode: Built-in test mode for development without consuming credits

Related MCP server: MCP Video Recognition Server

Quick Start

Prerequisites

Node.js 18+
Puter API Key (get from puter.com)

Installation

# Clone the repository
git clone https://github.com/your-username/puter-mcp.git
cd puter-mcp

# Install dependencies
npm install

# Build the project
npm run build

Configuration

Copy the environment file:

cp .env.example .env

Edit .env and add your Puter API key:

PUTER_API_KEY=your_puter_api_key_here

Usage

Claude Desktop / Trae

Add the following to your Claude Desktop or Trae configuration file:

Windows:

%APPDATA%\Trae\mcp_settings.json

macOS:

~/Library/Application Support/Trae/mcp_settings.json

Linux:

~/.config/Trae/mcp_settings.json

Configuration content:

{
  "mcpServers": {
    "puter-mcp": {
      "command": "node",
      "args": ["path/to/puter-mcp/dist/index.js"],
      "env": {
        "PUTER_API_KEY": "your_api_key"
      }
    }
  }
}

Command Line

# Stdio mode (default)
npm start

# SSE mode
TRANSPORT=sse PORT=3000 npm start

Tools Reference

txt2img

Generate images from text prompts. Supports both text-to-image and image-to-image.

Parameter	Type	Description
`prompt`	string	Text description for the image
`model`	string	Model to use (default: gpt-image-2 for text-to-image, gemini-2.5-flash-image-preview for image-to-image)
`provider`	string	AI provider (openai-image-generation, gemini, together, xai, replicate-image-generation)
`quality`	string	Image quality (high, medium, low, hd, standard)
`ratio`	object	Aspect ratio {w, h}
`input_image`	string	Input image for image-to-image (Base64 or URL)
`test_mode`	boolean	Test mode without credits
`output_format`	string	Output format (base64, url)

Example:

Generate a picture of a cat

txt2speech

Convert text to speech.

Parameter	Type	Description
`text`	string	Text to convert
`provider`	string	TTS provider (aws-polly, openai, elevenlabs, gemini, xai)
`model`	string	TTS model
`voice`	string	Voice ID
`engine`	string	Synthesis engine (standard, neural, long-form, generative)
`language`	string	Language code
`test_mode`	boolean	Test mode

Example:

Convert "Hello world" to speech

txt2vid

Generate videos from text prompts.

Parameter	Type	Description
`prompt`	string	Video description
`model`	string	Video model (sora-2, veo-3.1-generate-preview, etc.)
`seconds`	number	Video duration (4, 8, 12)
`size`	string	Resolution (e.g., 1280x720)
`test_mode`	boolean	Test mode

Example:

Generate a video of a drone flying over mountains

img2txt

Extract text from images (OCR).

Parameter	Type	Description
`source`	string	Image URL, Base64, or Puter path
`provider`	string	OCR provider (aws-textract, mistral)
`test_mode`	boolean	Test mode

Example:

Extract text from this image: https://example.com/document.png

speech2txt

Convert speech to text.

Parameter	Type	Description
`audio`	string	Audio URL, Base64, or Puter path
`provider`	string	STT provider (openai, xai)
`model`	string	Model name
`language`	string	Language code
`translate`	boolean	Translate to English
`test_mode`	boolean	Test mode

Example:

Transcribe this audio: https://example.com/speech.mp3

speech2speech

Convert voice to another voice using ElevenLabs.

Parameter	Type	Description
`audio`	string	Input audio URL, Base64, or Puter path
`voice`	string	Target ElevenLabs voice ID
`model`	string	Voice model (default: eleven_multilingual_sts_v2)
`output_format`	string	Output format
`test_mode`	boolean	Test mode

Example:

Convert this voice to a different voice: https://example.com/speech.mp3

Development

Project Structure

puter-mcp/
├── src/
│   ├── index.ts          # Server entry point
│   ├── client.ts         # Puter SDK initialization
│   ├── utils.ts          # Response formatting utilities
│   ├── puter.d.ts       # TypeScript declarations
│   └── tools/
│       ├── index.ts      # Tool registration
│       ├── txt2img.ts
│       ├── txt2speech.ts
│       ├── txt2vid.ts
│       ├── img2txt.ts
│       ├── speech2txt.ts
│       └── speech2speech.ts
├── scripts/
│   └── verify-responses.ts  # SDK response verification
├── dist/                 # Compiled output
├── package.json
└── tsconfig.json

Build

npm run build

Type Check

npm run typecheck

Development Mode

npm run dev

License

MIT License - see LICENSE for details.

Acknowledgments

Puter - AI services provider
MCP SDK - Model Context Protocol

Support

Issue Tracker: https://github.com/your-username/puter-mcp/issues
Documentation: https://docs.puter.com/AI/

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Appeared in Searches

Interfacing with Generative AI Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tiantian-pago/puter_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server