Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
REPLICATE_API_TOKENYesYour Replicate API token.
REPLICATE_DOWNLOAD_DIRNoWhere to save generated files. Default: ~/Downloads/replicate-mcp~/Downloads/replicate-mcp
REPLICATE_WEBHOOK_PORTNoPort for webhook receiver.
REPLICATE_API_TOKEN_POOLNoComma-separated list of Replicate API tokens for round-robin rate limit distribution.
REPLICATE_WEBHOOK_PUBLIC_URLNoPublic URL for webhook callbacks to avoid polling.

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}
prompts
{
  "listChanged": true
}
resources
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
replicate_generate_imageA

Generate one or more images from a text prompt using a Replicate image model.

Use this for any "draw / create / generate an image of …" request. By default it uses Flux Schnell (fast, ~2 seconds per image).

DISPLAY REQUIREMENT — after this tool returns successfully, you MUST embed the image inline in your reply by pasting ONE of the three embed blocks the tool prints verbatim (Option 1 iframe, Option 2 , or Option 3 markdown — try them in that order; pick the first one your chat client renders). The iframe variant scales to the chat column width with the image's native aspect ratio; the variant is a responsive fallback; markdown is the universal last resort. Place the chosen embed BEFORE any descriptive prose. Do NOT paraphrase the URL or omit the embed — the user wants the image to appear in the main chat flow, not only inside the collapsed tool widget. URLs expire in ~24h.

Args:

  • prompt (string): Text description of the image to generate.

  • model (string, default "flux-schnell"): Either a curated key (flux-schnell, flux-dev, flux-pro, flux-2-max, sd-3.5-large, recraft-v3, recraft-v4.1, ideogram-v2, imagen-3, seedream) or a full Replicate identifier "owner/name[:version]".

  • aspect_ratio ("1:1" | "16:9" | "9:16" | "4:3" | "3:4" | "21:9" | "3:2" | "2:3", optional): Aspect ratio. Default 1:1.

  • num_outputs (1-4, optional): How many images to generate.

  • seed (integer, optional): Random seed for reproducible output.

  • extra_input (object, optional): Model-specific extra inputs (e.g. {guidance: 3.5, num_inference_steps: 28}). Use replicate_get_model_schema if unsure.

  • download (boolean, default true): Download files locally to ~/Downloads/replicate-mcp/.

  • timeout_ms (5000-1800000, optional): Max wait. Default 300000 (5min).

Returns structuredContent matching PredictionResult: { "status": "starting" | "processing" | "succeeded" | "failed" | "canceled", "prediction_id": string, "model": string, "urls": string[], // Replicate URLs (expire ~24h) "local_paths": string[], // Absolute paths on disk when download=true "metrics": { "predict_time_seconds": number } | undefined, "error": string | undefined, "pending": boolean | undefined // true if timed out — poll via replicate_get_prediction }

Examples:

  • "An origami fox in a misty forest" → uses flux-schnell, 1:1

  • prompt="logo for a coffee shop called Crema", model="recraft-v3" → for text-in-image

  • prompt="cinematic shot of a lighthouse", model="flux-pro", aspect_ratio="21:9", seed=42

Error handling:

  • If REPLICATE_API_TOKEN is missing, returns an actionable error telling the user how to set it.

  • Invalid model IDs return Replicate's error message verbatim.

replicate_generate_videoA

Generate a video clip from a text prompt (and optionally a starting image). Video generation is slow — typically 1-5 minutes per clip.

DISPLAY REQUIREMENT — after this tool returns successfully, include the URL(s) printed in the tool's text content so the user can open the video. URLs expire in ~24h.

Args:

  • prompt (string): Text description of the video.

  • model (string, default "kling-pro"): Curated key (kling-pro, minimax-video, hunyuan-video, luma-ray, wan-2.2, grok-video, seedance) or "owner/name[:version]".

  • image_url (string, optional): Starting frame for image-to-video. Not all models support this.

  • duration_seconds (1-60, optional): Desired duration. Model-dependent.

  • aspect_ratio ("16:9" | "9:16" | "1:1", optional): Aspect ratio.

  • extra_input (object, optional): Additional model-specific inputs.

  • download (boolean, default true): Download the MP4 locally.

  • timeout_ms: Max wait. Default 300000 (5min). For very long videos, increase or rely on the pending+poll flow.

Returns: PredictionResult (see replicate_generate_image for shape). The local_paths will contain .mp4 files when downloaded.

Tip: If timeout_ms is exceeded, the result will have pending=true and a prediction_id. Wait a minute, then call replicate_get_prediction.

replicate_generate_audioA

Generate music, ambient audio, or full songs from a text prompt.

DISPLAY REQUIREMENT — after this tool returns successfully, include the URL(s) printed in the tool's text content as a markdown link [Audio](URL) in your reply so the user can play it. URLs expire in ~24h.

Models:

  • "musicgen" (default): Meta MusicGen. Instrumental music up to 30s. prompt → "prompt" field.

  • "ace-step": Full songs with lyrics. prompt → "tags" field (style/genre tags). Pass lyrics separately via extra_input.lyrics. ~3-4 minutes runtime.

  • "riffusion": Loop-friendly ambient/electronic. prompt → "prompt_a" field. No duration control.

  • "minimax-music": MiniMax Music 2.6. Full songs up to 6min. prompt=style description; pass lyrics via extra_input.lyrics.

  • "lyria-3-pro": Google Lyria 3 Pro. Full songs up to 3min WITH sung vocals. Put genre, mood, lyrics, and structure ([Verse]/[Chorus]) directly in the prompt. No duration — do NOT pass duration_seconds. Also "lyria-3" (30s clips) and "lyria-2" (48kHz instrumental).

Args:

  • prompt (string): Description of the music. For ace-step this maps to the "tags" field (style tags like "rock, guitar, upbeat"). For riffusion this maps to "prompt_a". For lyria put genre/mood/lyrics/structure here.

  • model (string, default "musicgen"): Curated key (musicgen, ace-step, riffusion, minimax-music, lyria-3-pro, lyria-3, lyria-2) or "owner/name[:version]".

  • duration_seconds (1-300, optional): Duration in seconds. Supported by musicgen and ace-step. Ignored for riffusion and the lyria models (they have no duration parameter).

  • extra_input (object, optional): Additional inputs. Examples: {temperature: 1.0, top_k: 250} for MusicGen; {lyrics: "verse lyrics here"} for ace-step.

  • download (boolean, default true): Download as MP3/WAV.

  • timeout_ms: Default 300000 (5min).

Returns: PredictionResult. local_paths contain audio files.

Examples:

  • prompt="upbeat synthwave with driving bassline", duration_seconds=15 → musicgen

  • prompt="indie folk, acoustic guitar, female vocals", model="ace-step", extra_input={lyrics: "Leaving home on a rainy day..."}

  • prompt="ambient lo-fi chill", model="riffusion"

replicate_generate_speechA

Convert text to natural-sounding speech.

DISPLAY REQUIREMENT — after this tool returns successfully, include the URL printed in the tool's text content as a markdown link [Speech](URL) in your reply so the user can play it. URLs expire in ~24h.

Args:

  • text (string, 1-5000): Text to synthesize.

  • model (string, default "kokoro"): Curated key (kokoro, minimax-speech, chatterbox, gemini-tts, grok-tts) or "owner/name[:version]".

  • voice (string, optional): Voice ID. For Kokoro: af_bella, af_sarah, am_adam, am_michael, bf_emma, bf_isabella, etc. (a-f = American female, b-f = British female, a-m = American male, b-m = British male).

  • speed (0.5-2.0, optional): Speech rate.

  • extra_input (object, optional): Model-specific extras (e.g. {audio_prompt: ""} for voice cloning with Chatterbox).

  • download (boolean, default true).

  • timeout_ms: Default 300000.

Returns: PredictionResult. local_paths contain WAV/MP3 files.

replicate_chatA

Run a large language model hosted on Replicate. Use this for free-form text generation, Q&A, code writing, summarisation, translation — anything where the input is text and the output is text.

Args:

  • prompt (string): User message.

  • model (string, default "llama-3-70b"): Curated key (llama-3.1-405b, llama-3-70b, llama-3-8b, mistral-7b, mixtral-8x7b, deepseek-r1) or "owner/name".

  • system_prompt (string, optional): Persona / instructions.

  • max_tokens (1-8192, optional): Generation limit.

  • temperature (0-2, optional): Sampling temperature.

  • extra_input (object, optional): Model-specific extras (top_p, top_k, frequency_penalty, etc.).

  • download (boolean, default false): No file outputs; leave false.

  • timeout_ms (5000-1800000, optional): Default 300000.

Returns: PredictionResult with text_output[0] containing the model's reply (later entries are raw streamed segments if applicable).

Examples:

  • prompt="Explain quantum entanglement in two sentences.", model="llama-3-70b"

  • prompt="Write a Python function to compute Levenshtein distance.", model="mistral-large", system_prompt="You are an expert software engineer."

replicate_visionA

Run a vision-language model to describe, caption, or answer questions about an image.

Args:

  • image (string URL): URL of the image to analyse.

  • prompt (string, optional): Question or instruction (e.g. "describe this image", "count the people"). Default is a generic caption.

  • model (string, default "llava-13b"): Curated key (llava-13b, llava-v1.6-34b, blip-2, qwen-vl) or "owner/name".

  • max_tokens (1-4096, optional): Response length.

  • extra_input (object, optional): Model-specific extras.

Returns: PredictionResult with text_output containing the model's textual answer.

Examples:

  • image="https://example.com/photo.jpg", prompt="What objects are visible?"

  • image="", prompt="Read the values off this chart and list them.", model="llava-v1.6-34b"

replicate_upscale_imageA

Upscale an image to higher resolution. Optional face restoration for photos.

DISPLAY REQUIREMENT — after this tool returns successfully, embed the upscaled image inline using one of the three blocks (iframe / / markdown) printed by the tool. Place it BEFORE descriptive prose. URLs expire ~24h.

Args:

  • image (string URL): URL of the source image.

  • model (string, default "real-esrgan"): Curated key (real-esrgan, clarity-upscaler, swinir, gfpgan) or "owner/name".

  • scale (1-10, optional): Upscale factor. Default 4 for real-esrgan; 2 for gfpgan; 2 for clarity-upscaler.

  • extra_input (object, optional): Model-specific extras (e.g. {face_enhance: true} for real-esrgan).

  • download (boolean, default true): Download upscaled file locally.

Returns: PredictionResult with urls + local_paths to the upscaled image.

Examples:

  • image="", scale=4 → real-esrgan

  • image="", model="gfpgan", scale=2 → restoration

  • image="", model="clarity-upscaler", scale=2

replicate_remove_backgroundA

Produce a transparent-background version (PNG) of an image.

DISPLAY REQUIREMENT — after this tool returns successfully, embed the cut-out image inline using one of the three blocks (iframe / / markdown) printed by the tool.

Args:

  • image (string URL): URL of the source image.

  • model (string, default "rembg"): Curated key (rembg, birefnet, briaai-rmbg) or "owner/name".

  • extra_input (object, optional): Model-specific extras.

  • download (boolean, default true): Download the cut-out PNG locally.

Returns: PredictionResult with urls + local_paths to a transparent PNG.

Examples:

  • image="" → rembg quick cut

  • image="", model="birefnet" → sharper edge for hair

replicate_transcribe_audioA

Transcribe an audio or video file to text using Whisper-family models on Replicate.

Args:

  • audio (URL): URL of the audio (or video) to transcribe.

  • model (default "incredibly-fast-whisper"): Curated key (whisper, incredibly-fast-whisper, whisperx, scribe) or "owner/name".

  • language (string, optional): ISO-639 hint (e.g. "en", "it"). Default: auto-detect.

  • translate_to_english (bool, optional): Translate the transcript to English instead of preserving source language.

  • extra_input (object, optional): Model-specific extras (e.g. {batch_size: 24} for incredibly-fast-whisper).

Returns: PredictionResult with text_output containing the transcript.

replicate_inpaintA

Fill masked regions of an image based on a text prompt. Works for both inpainting (replace inside) and outpainting (extend canvas) when the mask covers the target area.

DISPLAY REQUIREMENT — embed the result inline using one of the three blocks (iframe / / markdown) printed by the tool.

Args:

  • image (URL): Source image.

  • mask (URL): Mask image. White = keep, black/transparent = repaint.

  • prompt: Describes what should appear in the masked region.

  • model (default "flux-fill-pro"): Curated (flux-fill-pro, sd-inpaint, ideogram-v2-edit) or "owner/name".

  • extra_input (object, optional): Model-specific extras (e.g. {guidance: 30} for flux-fill-pro).

replicate_segmentA

Produce a segmentation mask of an image. Use SAM 2 for point/box-prompt masks (auto-mask everything when no prompt given) or Grounded-SAM for text-prompt masking like "the red car".

DISPLAY REQUIREMENT — embed the mask result inline using one of the three blocks printed by the tool.

Args:

  • image (URL): Source image.

  • prompt (string, optional): Text prompt for grounded segmentation. Required for grounded-sam.

  • model (default "sam-2"): Curated (sam-2, grounded-sam) or "owner/name".

  • extra_input (object, optional): SAM-specific tuning (e.g. {points_per_side: 32}).

replicate_embed_textA

Convert text(s) into numeric embedding vectors. Useful for RAG, semantic search, clustering, similarity scoring.

Args:

  • texts: A single string or an array of strings (max 256). Each text is embedded independently.

  • model (default "bge-large"): Curated (bge-large, jina-embeddings-v3, all-minilm) or "owner/name".

  • extra_input (object, optional): Model-specific extras (e.g. {task: "retrieval.query"} for jina v3).

Returns: PredictionResult — the embedding vectors are in structuredContent.output (model-specific shape).

replicate_clone_voiceA

Synthesize speech in a cloned voice. Provide a short reference audio sample (~5-30 s) and the text to speak; the model reproduces the voice characteristics.

DISPLAY REQUIREMENT — after this tool returns successfully, include the URL printed in the tool's text content as a markdown link [Audio](URL) so the user can play it. URLs expire in ~24h.

Args:

  • text (string, 1-5000): Text to synthesize in the cloned voice.

  • reference_audio_url (URL): URL of the voice sample to clone from. Use replicate_upload_file to upload a local file first.

  • language (string, optional): ISO-639 code (e.g. "en", "es", "it"). Default "en".

  • model (string, default "xtts-v2"): Curated key (xtts-v2, openvoice-v2) or "owner/name[:version]".

  • extra_input (object, optional): Model-specific extras.

  • download (boolean, default true).

  • timeout_ms: Default 300000.

Returns: PredictionResult. local_paths contain WAV/MP3 files.

Examples:

  • text="Hello world, this is my cloned voice.", reference_audio_url="<url-to-your-voice-sample.wav>"

  • text="Buongiorno a tutti!", reference_audio_url="", language="it"

replicate_generate_3dA

Generate a 3D mesh (GLB/OBJ) from a text prompt or a reference image. 3D generation is slow — typically 1-5 minutes.

DISPLAY REQUIREMENT — after this tool returns successfully, include the download URL(s) so the user can open the 3D file. URLs expire in ~24h.

Args:

  • prompt (string, optional): Text description of the 3D object. Provide at least one of prompt or image_url.

  • image_url (URL, optional): Reference image to convert to 3D. Provide at least one of prompt or image_url. Use replicate_upload_file for local files.

  • model (string, default "hunyuan-3d"): Curated key (hunyuan-3d, rodin, triposr) or "owner/name[:version]".

  • extra_input (object, optional): Model-specific extras (e.g. {num_inference_steps: 50}).

  • download (boolean, default true): Download the GLB/OBJ locally.

  • timeout_ms: Default 300000. For complex objects, increase or use the pending+poll flow.

Returns: PredictionResult. local_paths will contain .glb or .obj files.

Examples:

  • prompt="A red ceramic teapot" → hunyuan-3d

  • image_url="", model="triposr" → fast single-image 3D

  • image_url="", model="rodin" → high-quality 3D

replicate_lipsyncA

Animate a portrait image to speak — either from a text script (model does TTS + lipsync) or from a driving audio file. Produces an MP4 video.

DISPLAY REQUIREMENT — after this tool returns successfully, include the URL(s) so the user can open the video. URLs expire in ~24h.

Args:

  • image_url (URL): Portrait or face image to animate. Use replicate_upload_file for local files.

  • text (string, optional): Script for the avatar to speak. Used by video-avatar (maps to voice_script). At least one of text or audio_url is required.

  • audio_url (URL, optional): Driving audio for lipsync. Required for sadtalker; optional override for video-avatar. At least one of text or audio_url is required.

  • model (string, default "video-avatar"): Curated key (video-avatar, sadtalker) or "owner/name[:version]".

  • extra_input (object, optional): Model-specific extras (e.g. {voice_prompt: "speak slowly"} for video-avatar).

  • download (boolean, default true): Download the MP4 locally.

  • timeout_ms: Default 300000.

Returns: PredictionResult. local_paths contain .mp4 files.

Examples:

  • image_url="<portrait.jpg>", text="Hello! Welcome to our product demo." → video-avatar (TTS + lipsync)

  • image_url="<face.jpg>", audio_url="<speech.wav>", model="sadtalker" → audio-driven lipsync

replicate_run_modelA

Generic escape hatch: run ANY model in the Replicate catalog by its "owner/name" identifier. This tool gives Claude access to the entire Replicate model catalog — anything not covered by the curated specialised tools (image, video, audio, speech, chat, vision, upscale, remove-bg) can be reached from here.

DISPLAY REQUIREMENT — if the result includes image URLs, paste ONE of the embed blocks the tool prints (iframe / / markdown — try in order) verbatim in your reply so the image renders inline in the chat.

Use this for any category WITHOUT a curated specialised tool, including but not limited to:

  • Embeddings (sentence-transformers, BGE, Jina)

  • Segmentation (SAM, Segment Anything)

  • Depth estimation (MiDaS, ZoeDepth, Marigold)

  • Inpainting / outpainting (LaMa, Stable Diffusion Inpaint, controlnet-inpaint)

  • ControlNet variants (canny, depth, openpose, normal-map)

  • Face / pose / hand detection (insightface, mediapipe, etc.)

  • 3D generation (TripoSR, Wonder3D, InstantMesh)

  • Audio-to-text / speech recognition (whisper, Distil-Whisper)

  • Audio separation / stem splitting (Demucs, MDX)

  • Style transfer, colourisation, deblurring, denoising

  • Code completion / instruction-tuned code models (CodeLlama, DeepSeek-Coder)

  • Music continuation / source separation

  • ANY newly released model not yet in the curated registries

Workflow:

  1. (Optional) Call replicate_search_models to discover models by keyword (e.g. "image segmentation", "speech to text").

  2. (Recommended) Call replicate_get_model_schema with "owner/name" to inspect required inputs.

  3. Call this tool with the model id and an input object matching that schema.

Args:

  • model (string): "owner/name" (latest official version) or "owner/name:version_hash" (pinned).

  • input (object): Model-specific input parameters.

  • download (boolean, default true): Download outputs locally.

  • timeout_ms: Default 300000.

Returns: PredictionResult.

Examples:

  • Upscale an image: model="nightmareai/real-esrgan", input={"image": "https://example.com/in.png", "scale": 4}

  • Remove background: model="lucataco/remove-bg", input={"image": ""}

  • Run an LLM (output is text, not a file, so local_paths will be empty): model="meta/meta-llama-3-70b-instruct", input={"prompt": "Explain quantum entanglement in two sentences."}

replicate_search_modelsA

Search the Replicate catalog by free-text query. Returns up to 25 matching models with names, descriptions, and URLs.

Args:

  • query (string, 1-200 chars): Free-text search. Examples: "image upscaler", "voice cloning", "depth estimation", "code generation".

Returns structuredContent: { "count": number, "models": [ { "owner": string, "name": string, "description": string | undefined, "url": string, "run_count": number | undefined, "cover_image_url": string | undefined } ] }

Tip: Once you find a promising model, call replicate_get_model_schema with "owner/name" to see its inputs before calling replicate_run_model.

replicate_get_model_schemaA

Retrieve metadata and the OpenAPI input/output schema for a specific Replicate model. Use this before replicate_run_model to know which fields the model accepts and what they mean.

Args:

  • model (string): "owner/name" or "owner/name:version".

Returns structuredContent: { "model": string, "description": string | undefined, "visibility": string | undefined, "latest_version_id": string | undefined, "input_schema": object | undefined, // OpenAPI schema for inputs "output_schema": object | undefined, // OpenAPI schema for outputs "example_url": string | undefined // Replicate page with examples }

replicate_get_predictionA

Retrieve the current status and (if available) outputs of a Replicate prediction by its ID. Use this when a previous generate_* or run_model call returned pending=true (timed out before completion).

Args:

  • prediction_id (string): The ID returned by a previous call.

  • download (boolean, default true): If the prediction has succeeded, download its outputs locally.

Returns: PredictionResult — same shape as replicate_generate_image. If still running, status will be "processing" or "starting" and pending will be true.

Typical flow:

  1. Call replicate_generate_video → returns pending=true with prediction_id=abc123.

  2. Wait ~1 minute.

  3. Call replicate_get_prediction with prediction_id=abc123 → returns succeeded + URLs + local_paths.

replicate_upload_fileA

Upload a file to Replicate's file storage and get back a URL valid for ~24 hours. Pass the returned URL as a model input (e.g. image for upscale/inpaint/vision, image_url for video, reference_audio_url for voice clone).

Two input modes — provide EXACTLY ONE:

  • file_path: absolute local path of a file on the machine running the server.

  • base64_data: the file's bytes as base64 (a bare base64 string OR a full "data:;base64,..." URI). Use this when you hold bytes in memory but have no local path — e.g. an image a user dropped into the chat that a code container can read and base64-encode. NOTE: an MCP client (Claude Desktop) generally cannot reproduce a large dragged-in image's exact bytes as a tool argument — base64 mode is for callers that genuinely have the bytes (web container, programmatic clients).

Args:

  • file_path (string, optional): Absolute local path. Provide this OR base64_data.

  • base64_data (string, optional): base64 contents or data: URI. Provide this OR file_path.

  • mime_type (string, optional): MIME override (e.g. 'image/png'). Auto-detected from the path extension or a data: URI; defaults to application/octet-stream for raw base64.

  • file_name (string, optional): Name for a base64 upload.

Returns structuredContent: { url, file_id, name }

  • url: Replicate-hosted URL (~24h expiry) — pass this as a model input.

Examples:

  • file_path="C:/Users/me/photo.png"

  • base64_data="data:image/png;base64,iVBORw0KG...", → uploads, returns URL

replicate_recommend_modelA

Rank the curated models in a category by a priority (speed, cost, quality, or balanced) and return recommendations with cost estimates and reasoning. This does NOT run anything — it advises which model to use.

Workflow: call this to pick a model, then call the matching generate tool (e.g. replicate_generate_image) with model set to the recommended key.

Args:

  • category (required): One of image, video, audio, tts, llm, vision, upscale, bg, stt, inpaint, segment, embed, voiceclone, threed, lipsync.

  • priority (default "balanced"): "speed" (fastest), "cost" (cheapest), "quality" (best), or "balanced" (weighted).

  • task_description (optional): Free text. Keyword hints like "quick draft" or "professional logo" nudge balanced ranking.

  • max_cost_usd (optional): Exclude models estimated above this cost.

  • duration_seconds (optional, 1–600): For per-second-priced categories (video, audio), used in cost estimation.

Returns structuredContent: { category, priority, recommendations: [{ key, model_id, speed, est_cost_usd, score, reason }], // top 5 count }

Examples:

  • category="image", priority="speed" → flux-schnell first

  • category="image", priority="quality" → highest-fidelity model first

  • category="video", priority="cost", duration_seconds=5 → cheapest per-5s clip

replicate_pipeline_startA

Run a directed acyclic graph (DAG) of Replicate predictions as a background job. Returns a pipeline_id immediately. Poll replicate_pipeline_status for per-step progress and results.

Independent steps run concurrently. Downstream steps auto-start when their dependencies complete. Use "$stepId.field[n]" template strings to pass one step's output as another step's input.

IMPORTANT: model must be a full Replicate identifier ("owner/name" or "owner/name:version"). Curated shortcuts (e.g. "flux-schnell") are not supported — look up the full id via replicate_get_model_schema.

Template reference syntax: "$gen.urls[0]" → first URL output of step "gen" "$gen.urls" → full URLs array "$gen.local_paths[0]" → first downloaded local path "$gen.text_output[0]" → first text output (for LLMs)

Args:

  • steps (array, 1–20): Pipeline steps. Each: { id, model, input, depends_on? }. depends_on is inferred from $ref patterns in input when omitted.

  • concurrency (1–5, default 3): Max simultaneous steps.

  • download (boolean, default true): Download step outputs locally.

  • timeout_ms_per_step (default 300000): Per-step timeout.

  • ttl_hours (1–72, default 1): How long to keep results in memory. Lost on server restart.

Returns: { pipeline_id, total, message }

Example — generate + upscale + remove background in parallel: steps=[ { "id": "gen", "model": "black-forest-labs/flux-schnell", "input": { "prompt": "a fox" } }, { "id": "upscale", "model": "nightmareai/real-esrgan", "input": { "image": "$gen.urls[0]", "scale": 4 } }, { "id": "no_bg", "model": "lucataco/remove-bg", "input": { "image": "$gen.urls[0]" } } ] upscale and no_bg both depend on gen, run in parallel after gen completes.

replicate_pipeline_statusA

Poll the status of a pipeline started with replicate_pipeline_start.

Args:

  • pipeline_id (string): Pipeline ID returned by replicate_pipeline_start.

  • include_outputs (boolean, default true): Include full PredictionResult per step. Set false for a counts-only summary while the pipeline is running.

Returns structuredContent: { pipeline_id, overall_status, total, succeeded, failed, skipped, running, pending, created_at, expires_at, steps: [{ id, model, status, prediction_id, result?, error?, skip_reason?, started_at, completed_at }] }

overall_status: "running" — steps still executing "completed" — all steps succeeded "partial" — all done, at least one failed or was skipped (failed dependency or budget error)

Note: pipeline-level errors (cycle detected, unknown depends_on) are rejected at replicate_pipeline_start with an error response — they never produce a pollable pipeline.

Tip: Poll every 10–30 seconds until overall_status is "completed" or "partial".

replicate_batch_startA

Run multiple Replicate predictions in parallel as a background job. Returns a job_id immediately — the predictions run in the background. Poll replicate_batch_status for progress and results.

Use this when you have 2–50 predictions to run and don't want to block. Each item specifies its own model and input, so you can mix models in one batch.

IMPORTANT: model must be a full Replicate identifier ("owner/name" or "owner/name:version"), not a curated shortcut like "flux-schnell". Use replicate_get_model_schema to look up the correct identifier.

Args:

  • items (array, 1–50): Predictions to run. Each: { model: "owner/name[:version]", input: {...} }.

  • concurrency (1–10, default 3): Max simultaneous predictions. Raise with caution — Replicate rate-limits free accounts.

  • download (boolean, default true): Download output files locally.

  • timeout_ms_per_item (default 300000): Per-prediction timeout. Timed-out items have pending=true in their result.

  • ttl_hours (1–72, default 1): How long to keep results in memory. Job state is lost if the MCP server restarts.

Returns: { job_id, total, message }

Example: items=[ { model: "black-forest-labs/flux-schnell", input: { prompt: "a red fox" } }, { model: "black-forest-labs/flux-schnell", input: { prompt: "a blue whale" } }, ] → Returns { job_id: "abc-123", total: 2, message: "..." } → Then poll: replicate_batch_status({ job_id: "abc-123" })

replicate_batch_statusA

Poll the status of an async batch job started with replicate_batch_start.

Args:

  • job_id (string): Job ID returned by replicate_batch_start.

  • include_results (boolean, default true): Include full PredictionResult per item. Set false for a counts-only summary while the job is still running.

Returns structuredContent: { job_id, overall_status, total, succeeded, failed, running, pending, created_at, expires_at, items: [{ index, model, status, prediction_id, result?, error?, started_at, completed_at }] }

overall_status: "running" — predictions still in progress "completed" — all items succeeded "partial" — all done, at least one failed

Tip: Poll every 10–30 seconds until overall_status is "completed" or "partial".

replicate_list_predictionsA

Return the most recent predictions on the authenticated Replicate account. Useful to recover a prediction ID, audit recent calls, or check what's still running.

Args:

  • limit (1-100, default 10): How many predictions to return.

Returns structuredContent: { count: number, predictions: PredictionSummary[] } Each PredictionSummary has id, model, status, created_at, completed_at, url.

replicate_cancel_predictionA

Cancel an in-progress prediction by its ID. Useful for long-running async jobs (video, large LLM) when the user no longer needs the result.

Args:

  • prediction_id (string): ID of the prediction to cancel (returned by an earlier generate_* call).

Returns: PredictionSummary with updated status (typically "canceled").

replicate_estimate_costA

Return an approximate dollar-cost estimate for a planned prediction BEFORE running it. Prices are a hand-curated snapshot — actual billing comes from Replicate. Call this when the user asks "how much would X cost" or before launching a costly model.

Args:

  • model: Replicate "owner/name" id or a curated short key (e.g. "flux-schnell", "kling-pro").

  • num_outputs (1-20, optional): How many outputs to estimate. Default 1.

  • duration_seconds (1-600, optional): Required for per-second models (video, music, transcription, LLM).

Returns structuredContent: { resolved_model_id, num_outputs, duration_seconds, estimated_usd, pricing_basis, note }.

Examples:

  • model="flux-schnell", num_outputs=4 → ~$0.012 (4 × $0.003 per_run)

  • model="kling-pro", duration_seconds=5 → ~$0.45 (5 × $0.09 per_second)

  • model="meta/meta-llama-3-70b-instruct", duration_seconds=10 → ~$0.024 (10 × $0.0024 per_second)

replicate_refresh_modelsA

Search Replicate for popular models NOT yet in the curated registry. Returns suggestions only — does not modify code.

Use this to find new models worth adding. Then ask Claude to edit src/models.ts with the ones you want.

Args:

  • categories (string[], optional): Which categories to check. Default: all 15 (image, video, audio, tts, llm, vision, upscale, bg, stt, inpaint, segment, embed, voiceclone, threed, lipsync).

  • min_run_count (integer, optional): Minimum run_count threshold. Default: 1000.

  • limit_per_category (integer, optional): Max suggestions per category (1-20). Default: 5.

Returns structuredContent: { "checked_at": string, "categories_checked": string[], "suggestions": [{ category, owner, name, model_id, run_count, description, replicate_url }], "already_curated": number, "total_suggestions": number }

Examples:

  • "Check for new popular models" → all categories, min 1000 runs

  • categories=["image","video"], min_run_count=10000 → only top-tier image/video models

replicate_create_trainingA

Kick off a fine-tuning (training) run on a trainable base model — e.g. a Flux LoRA trainer — with your dataset and hyperparameters. Returns immediately with a training ID; poll it with replicate_get_training.

Args:

  • model: BASE trainer "owner/name" (or "owner/name:version" to pin the trainer version inline). e.g. "ostris/flux-dev-lora-trainer".

  • version (optional): trainer version id. Required unless pinned inline on model.

  • destination: "owner/name" the trained weights are pushed to. The destination model must already exist on your account.

  • input: training inputs as a JSON object (dataset URL + hyperparameters). Call replicate_get_model_schema on the trainer to see its exact inputs.

Returns structuredContent: TrainingSummary { id, status, model, version, destination, created_at, completed_at, output_version, error }.

replicate_get_trainingA

Retrieve the current state of a training run: status, the resulting trained model version (once it succeeds), and any error.

Args:

  • training_id: ID returned by replicate_create_training.

Returns structuredContent: TrainingSummary.

replicate_list_trainingsA

Return the most recent training runs on the authenticated account.

Args:

  • limit (1-100, default 10): How many trainings to return.

Returns structuredContent: { count: number, trainings: TrainingSummary[] }.

replicate_cancel_trainingA

Cancel an in-progress training run by its ID. Trainings can run for many minutes and cost real money — cancel when no longer needed.

Args:

  • training_id: ID of the training to cancel.

Returns structuredContent: TrainingSummary with the updated status (typically "canceled").

replicate_list_deploymentsA

List the deployments on the authenticated Replicate account. A deployment is a private, autoscaled endpoint pinned to a specific model + hardware.

Args:

  • limit (1-100, default 20): How many deployments to return.

Returns structuredContent: { count: number, deployments: DeploymentSummary[] }. Each DeploymentSummary has owner, name, and current_release { model, version, hardware, min_instances, max_instances }.

replicate_get_deploymentA

Get the configuration of one deployment: its current model + version, hardware, and autoscaling min/max instances.

Args:

  • deployment: "owner/name" of the deployment.

Returns structuredContent: DeploymentSummary.

replicate_run_deploymentA

Run a prediction against a deployment's current release. WAITS for the prediction to finish and (by default) auto-downloads the outputs locally — same UX as the curated generate_* tools.

Args:

  • deployment: "owner/name" of the deployment to run.

  • input: model input parameters as a JSON object (same shape the deployment's underlying model expects).

  • download (default true): download output files locally.

  • timeout_ms (optional): max ms to wait before returning a pending result you can poll with replicate_get_prediction.

Returns the standard prediction result (inline image preview / text output, URLs, local_paths, prediction_id).

Prompts

Interactive templates invoked by user choice

NameDescription
generate_mediaGenerate an image, video, audio clip, or speech from a plain-language description. Picks the matching replicate_generate_* tool and a sensible default model.
recommend_then_generatePick the best Replicate model for a task given a priority (speed, cost, or quality), then run it. Calls replicate_recommend_model and feeds the winner into the right generate tool.
batch_generateRun the same generation over many inputs concurrently. Uses replicate_batch_start with a list of prompts, then polls replicate_batch_status until done.
image_to_video_pipelineCompose a multi-step pipeline that first generates an image, then animates it into a video. Uses replicate_pipeline_start with a two-step DAG and $stepId.field references.
transcribe_and_summarizeTranscribe an audio/video file with speech-to-text, then summarise the transcript. Uses replicate_transcribe_audio and returns key points.

Resources

Contextual data attached and managed by the client

NameDescription
model-catalogEvery curated model this server knows, grouped by category, with the Replicate id, a one-line description, and a speed tier. Pass any key (or an 'owner/name[:version]') as a tool's `model` argument.
capabilitiesA summary of this server: categories covered, model count, transports, and the orchestration features (async batch, DAG pipelines, model recommender, cost estimator).

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sena-labs/replicate-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server