Skip to main content
Glama

Puter MCP Server

English | 中文

MCP (Model Context Protocol) server for Puter AI media generation. Provides 6 AI-powered tools for image generation, text-to-speech, video generation, OCR, speech-to-text, and voice conversion.

Features

  • txt2img: Text-to-image generation with multiple providers (OpenAI, Gemini, Together, xAI, Replicate)

  • txt2speech: Text-to-speech conversion with multiple voices and engines

  • txt2vid: Text-to-video generation (Sora, Veo, TogetherAI)

  • img2txt: Image-to-text (OCR) with AWS Textract or Mistral

  • speech2txt: Speech-to-text transcription

  • speech2speech: Voice conversion using ElevenLabs

Key Features

  • Intelligent Default Models: Automatically selects the best model based on task type

    • Text-to-image: gpt-image-2 (OpenAI)

    • Image-to-image: gemini-2.5-flash-image-preview (Gemini)

  • Multiple Providers: Support for OpenAI, Google Gemini, xAI (Grok), Replicate, Together AI, ElevenLabs

  • Flexible Output: Supports base64 and URL output formats

  • Test Mode: Built-in test mode for development without consuming credits

Quick Start

Prerequisites

  • Node.js 18+

  • Puter API Key (get from puter.com)

Installation

# Clone the repository
git clone https://github.com/your-username/puter-mcp.git
cd puter-mcp

# Install dependencies
npm install

# Build the project
npm run build

Configuration

  1. Copy the environment file:

cp .env.example .env
  1. Edit .env and add your Puter API key:

PUTER_API_KEY=your_puter_api_key_here

Usage

Claude Desktop / Trae

Add the following to your Claude Desktop or Trae configuration file:

Windows:

%APPDATA%\Trae\mcp_settings.json

macOS:

~/Library/Application Support/Trae/mcp_settings.json

Linux:

~/.config/Trae/mcp_settings.json

Configuration content:

{
  "mcpServers": {
    "puter-mcp": {
      "command": "node",
      "args": ["path/to/puter-mcp/dist/index.js"],
      "env": {
        "PUTER_API_KEY": "your_api_key"
      }
    }
  }
}

Command Line

# Stdio mode (default)
npm start

# SSE mode
TRANSPORT=sse PORT=3000 npm start

Tools Reference

txt2img

Generate images from text prompts. Supports both text-to-image and image-to-image.

Parameter

Type

Description

prompt

string

Text description for the image

model

string

Model to use (default: gpt-image-2 for text-to-image, gemini-2.5-flash-image-preview for image-to-image)

provider

string

AI provider (openai-image-generation, gemini, together, xai, replicate-image-generation)

quality

string

Image quality (high, medium, low, hd, standard)

ratio

object

Aspect ratio {w, h}

input_image

string

Input image for image-to-image (Base64 or URL)

test_mode

boolean

Test mode without credits

output_format

string

Output format (base64, url)

Example:

Generate a picture of a cat

txt2speech

Convert text to speech.

Parameter

Type

Description

text

string

Text to convert

provider

string

TTS provider (aws-polly, openai, elevenlabs, gemini, xai)

model

string

TTS model

voice

string

Voice ID

engine

string

Synthesis engine (standard, neural, long-form, generative)

language

string

Language code

test_mode

boolean

Test mode

Example:

Convert "Hello world" to speech

txt2vid

Generate videos from text prompts.

Parameter

Type

Description

prompt

string

Video description

model

string

Video model (sora-2, veo-3.1-generate-preview, etc.)

seconds

number

Video duration (4, 8, 12)

size

string

Resolution (e.g., 1280x720)

test_mode

boolean

Test mode

Example:

Generate a video of a drone flying over mountains

img2txt

Extract text from images (OCR).

Parameter

Type

Description

source

string

Image URL, Base64, or Puter path

provider

string

OCR provider (aws-textract, mistral)

test_mode

boolean

Test mode

Example:

Extract text from this image: https://example.com/document.png

speech2txt

Convert speech to text.

Parameter

Type

Description

audio

string

Audio URL, Base64, or Puter path

provider

string

STT provider (openai, xai)

model

string

Model name

language

string

Language code

translate

boolean

Translate to English

test_mode

boolean

Test mode

Example:

Transcribe this audio: https://example.com/speech.mp3

speech2speech

Convert voice to another voice using ElevenLabs.

Parameter

Type

Description

audio

string

Input audio URL, Base64, or Puter path

voice

string

Target ElevenLabs voice ID

model

string

Voice model (default: eleven_multilingual_sts_v2)

output_format

string

Output format

test_mode

boolean

Test mode

Example:

Convert this voice to a different voice: https://example.com/speech.mp3

Development

Project Structure

puter-mcp/
├── src/
│   ├── index.ts          # Server entry point
│   ├── client.ts         # Puter SDK initialization
│   ├── utils.ts          # Response formatting utilities
│   ├── puter.d.ts       # TypeScript declarations
│   └── tools/
│       ├── index.ts      # Tool registration
│       ├── txt2img.ts
│       ├── txt2speech.ts
│       ├── txt2vid.ts
│       ├── img2txt.ts
│       ├── speech2txt.ts
│       └── speech2speech.ts
├── scripts/
│   └── verify-responses.ts  # SDK response verification
├── dist/                 # Compiled output
├── package.json
└── tsconfig.json

Build

npm run build

Type Check

npm run typecheck

Development Mode

npm run dev

License

MIT License - see LICENSE for details.

Acknowledgments

  • Puter - AI services provider

  • MCP SDK - Model Context Protocol

Support

A
license - permissive license
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tiantian-pago/puter_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server