Replicate-MCP-Server
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| REPLICATE_API_TOKEN | Yes | Your Replicate API token. | |
| REPLICATE_DOWNLOAD_DIR | No | Where to save generated files. Default: ~/Downloads/replicate-mcp | ~/Downloads/replicate-mcp |
| REPLICATE_WEBHOOK_PORT | No | Port for webhook receiver. | |
| REPLICATE_API_TOKEN_POOL | No | Comma-separated list of Replicate API tokens for round-robin rate limit distribution. | |
| REPLICATE_WEBHOOK_PUBLIC_URL | No | Public URL for webhook callbacks to avoid polling. |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
| prompts | {
"listChanged": true
} |
| resources | {
"listChanged": true
} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| replicate_generate_imageA | Generate one or more images from a text prompt using a Replicate image model. Use this for any "draw / create / generate an image of …" request. By default it uses Flux Schnell (fast, ~2 seconds per image). DISPLAY REQUIREMENT — after this tool returns successfully, you MUST embed the image inline in your reply by pasting ONE of the three embed blocks the tool prints verbatim (Option 1 iframe, Option 2 , or Option 3 markdown — try them in that order; pick the first one your chat client renders). The iframe variant scales to the chat column width with the image's native aspect ratio; the variant is a responsive fallback; markdown is the universal last resort. Place the chosen embed BEFORE any descriptive prose. Do NOT paraphrase the URL or omit the embed — the user wants the image to appear in the main chat flow, not only inside the collapsed tool widget. URLs expire in ~24h. Args:
Returns structuredContent matching PredictionResult: { "status": "starting" | "processing" | "succeeded" | "failed" | "canceled", "prediction_id": string, "model": string, "urls": string[], // Replicate URLs (expire ~24h) "local_paths": string[], // Absolute paths on disk when download=true "metrics": { "predict_time_seconds": number } | undefined, "error": string | undefined, "pending": boolean | undefined // true if timed out — poll via replicate_get_prediction } Examples:
Error handling:
|
| replicate_generate_videoA | Generate a video clip from a text prompt (and optionally a starting image). Video generation is slow — typically 1-5 minutes per clip. DISPLAY REQUIREMENT — after this tool returns successfully, include the URL(s) printed in the tool's text content so the user can open the video. URLs expire in ~24h. Args:
Returns: PredictionResult (see replicate_generate_image for shape). The local_paths will contain .mp4 files when downloaded. Tip: If timeout_ms is exceeded, the result will have pending=true and a prediction_id. Wait a minute, then call replicate_get_prediction. |
| replicate_generate_audioA | Generate music, ambient audio, or full songs from a text prompt. DISPLAY REQUIREMENT — after this tool returns successfully, include the URL(s) printed in the tool's text content as a markdown link Models:
Args:
Returns: PredictionResult. local_paths contain audio files. Examples:
|
| replicate_generate_speechA | Convert text to natural-sounding speech. DISPLAY REQUIREMENT — after this tool returns successfully, include the URL printed in the tool's text content as a markdown link Args:
Returns: PredictionResult. local_paths contain WAV/MP3 files. |
| replicate_chatA | Run a large language model hosted on Replicate. Use this for free-form text generation, Q&A, code writing, summarisation, translation — anything where the input is text and the output is text. Args:
Returns: PredictionResult with text_output[0] containing the model's reply (later entries are raw streamed segments if applicable). Examples:
|
| replicate_visionA | Run a vision-language model to describe, caption, or answer questions about an image. Args:
Returns: PredictionResult with text_output containing the model's textual answer. Examples:
|
| replicate_upscale_imageA | Upscale an image to higher resolution. Optional face restoration for photos. DISPLAY REQUIREMENT — after this tool returns successfully, embed the upscaled image inline using one of the three blocks (iframe / / markdown) printed by the tool. Place it BEFORE descriptive prose. URLs expire ~24h. Args:
Returns: PredictionResult with urls + local_paths to the upscaled image. Examples:
|
| replicate_remove_backgroundA | Produce a transparent-background version (PNG) of an image. DISPLAY REQUIREMENT — after this tool returns successfully, embed the cut-out image inline using one of the three blocks (iframe / / markdown) printed by the tool. Args:
Returns: PredictionResult with urls + local_paths to a transparent PNG. Examples:
|
| replicate_transcribe_audioA | Transcribe an audio or video file to text using Whisper-family models on Replicate. Args:
Returns: PredictionResult with text_output containing the transcript. |
| replicate_inpaintA | Fill masked regions of an image based on a text prompt. Works for both inpainting (replace inside) and outpainting (extend canvas) when the mask covers the target area. DISPLAY REQUIREMENT — embed the result inline using one of the three blocks (iframe / / markdown) printed by the tool. Args:
|
| replicate_segmentA | Produce a segmentation mask of an image. Use SAM 2 for point/box-prompt masks (auto-mask everything when no prompt given) or Grounded-SAM for text-prompt masking like "the red car". DISPLAY REQUIREMENT — embed the mask result inline using one of the three blocks printed by the tool. Args:
|
| replicate_embed_textA | Convert text(s) into numeric embedding vectors. Useful for RAG, semantic search, clustering, similarity scoring. Args:
Returns: PredictionResult — the embedding vectors are in structuredContent.output (model-specific shape). |
| replicate_clone_voiceA | Synthesize speech in a cloned voice. Provide a short reference audio sample (~5-30 s) and the text to speak; the model reproduces the voice characteristics. DISPLAY REQUIREMENT — after this tool returns successfully, include the URL printed in the tool's text content as a markdown link Args:
Returns: PredictionResult. local_paths contain WAV/MP3 files. Examples:
|
| replicate_generate_3dA | Generate a 3D mesh (GLB/OBJ) from a text prompt or a reference image. 3D generation is slow — typically 1-5 minutes. DISPLAY REQUIREMENT — after this tool returns successfully, include the download URL(s) so the user can open the 3D file. URLs expire in ~24h. Args:
Returns: PredictionResult. local_paths will contain .glb or .obj files. Examples:
|
| replicate_lipsyncA | Animate a portrait image to speak — either from a text script (model does TTS + lipsync) or from a driving audio file. Produces an MP4 video. DISPLAY REQUIREMENT — after this tool returns successfully, include the URL(s) so the user can open the video. URLs expire in ~24h. Args:
Returns: PredictionResult. local_paths contain .mp4 files. Examples:
|
| replicate_run_modelA | Generic escape hatch: run ANY model in the Replicate catalog by its "owner/name" identifier. This tool gives Claude access to the entire Replicate model catalog — anything not covered by the curated specialised tools (image, video, audio, speech, chat, vision, upscale, remove-bg) can be reached from here. DISPLAY REQUIREMENT — if the result includes image URLs, paste ONE of the embed blocks the tool prints (iframe / / markdown — try in order) verbatim in your reply so the image renders inline in the chat. Use this for any category WITHOUT a curated specialised tool, including but not limited to:
Workflow:
Args:
Returns: PredictionResult. Examples:
|
| replicate_search_modelsA | Search the Replicate catalog by free-text query. Returns up to 25 matching models with names, descriptions, and URLs. Args:
Returns structuredContent: { "count": number, "models": [ { "owner": string, "name": string, "description": string | undefined, "url": string, "run_count": number | undefined, "cover_image_url": string | undefined } ] } Tip: Once you find a promising model, call replicate_get_model_schema with "owner/name" to see its inputs before calling replicate_run_model. |
| replicate_get_model_schemaA | Retrieve metadata and the OpenAPI input/output schema for a specific Replicate model. Use this before replicate_run_model to know which fields the model accepts and what they mean. Args:
Returns structuredContent: { "model": string, "description": string | undefined, "visibility": string | undefined, "latest_version_id": string | undefined, "input_schema": object | undefined, // OpenAPI schema for inputs "output_schema": object | undefined, // OpenAPI schema for outputs "example_url": string | undefined // Replicate page with examples } |
| replicate_get_predictionA | Retrieve the current status and (if available) outputs of a Replicate prediction by its ID. Use this when a previous generate_* or run_model call returned pending=true (timed out before completion). Args:
Returns: PredictionResult — same shape as replicate_generate_image. If still running, status will be "processing" or "starting" and pending will be true. Typical flow:
|
| replicate_upload_fileA | Upload a file to Replicate's file storage and get back a URL valid for ~24 hours. Pass the returned URL as a model input (e.g. image for upscale/inpaint/vision, image_url for video, reference_audio_url for voice clone). Two input modes — provide EXACTLY ONE:
Args:
Returns structuredContent: { url, file_id, name }
Examples:
|
| replicate_recommend_modelA | Rank the curated models in a category by a priority (speed, cost, quality, or balanced) and return recommendations with cost estimates and reasoning. This does NOT run anything — it advises which model to use. Workflow: call this to pick a model, then call the matching generate tool (e.g. replicate_generate_image) with model set to the recommended key. Args:
Returns structuredContent: { category, priority, recommendations: [{ key, model_id, speed, est_cost_usd, score, reason }], // top 5 count } Examples:
|
| replicate_pipeline_startA | Run a directed acyclic graph (DAG) of Replicate predictions as a background job. Returns a pipeline_id immediately. Poll replicate_pipeline_status for per-step progress and results. Independent steps run concurrently. Downstream steps auto-start when their dependencies complete. Use "$stepId.field[n]" template strings to pass one step's output as another step's input. IMPORTANT: model must be a full Replicate identifier ("owner/name" or "owner/name:version"). Curated shortcuts (e.g. "flux-schnell") are not supported — look up the full id via replicate_get_model_schema. Template reference syntax: "$gen.urls[0]" → first URL output of step "gen" "$gen.urls" → full URLs array "$gen.local_paths[0]" → first downloaded local path "$gen.text_output[0]" → first text output (for LLMs) Args:
Returns: { pipeline_id, total, message } Example — generate + upscale + remove background in parallel: steps=[ { "id": "gen", "model": "black-forest-labs/flux-schnell", "input": { "prompt": "a fox" } }, { "id": "upscale", "model": "nightmareai/real-esrgan", "input": { "image": "$gen.urls[0]", "scale": 4 } }, { "id": "no_bg", "model": "lucataco/remove-bg", "input": { "image": "$gen.urls[0]" } } ] upscale and no_bg both depend on gen, run in parallel after gen completes. |
| replicate_pipeline_statusA | Poll the status of a pipeline started with replicate_pipeline_start. Args:
Returns structuredContent: { pipeline_id, overall_status, total, succeeded, failed, skipped, running, pending, created_at, expires_at, steps: [{ id, model, status, prediction_id, result?, error?, skip_reason?, started_at, completed_at }] } overall_status: "running" — steps still executing "completed" — all steps succeeded "partial" — all done, at least one failed or was skipped (failed dependency or budget error) Note: pipeline-level errors (cycle detected, unknown depends_on) are rejected at replicate_pipeline_start with an error response — they never produce a pollable pipeline. Tip: Poll every 10–30 seconds until overall_status is "completed" or "partial". |
| replicate_batch_startA | Run multiple Replicate predictions in parallel as a background job. Returns a job_id immediately — the predictions run in the background. Poll replicate_batch_status for progress and results. Use this when you have 2–50 predictions to run and don't want to block. Each item specifies its own model and input, so you can mix models in one batch. IMPORTANT: model must be a full Replicate identifier ("owner/name" or "owner/name:version"), not a curated shortcut like "flux-schnell". Use replicate_get_model_schema to look up the correct identifier. Args:
Returns: { job_id, total, message } Example: items=[ { model: "black-forest-labs/flux-schnell", input: { prompt: "a red fox" } }, { model: "black-forest-labs/flux-schnell", input: { prompt: "a blue whale" } }, ] → Returns { job_id: "abc-123", total: 2, message: "..." } → Then poll: replicate_batch_status({ job_id: "abc-123" }) |
| replicate_batch_statusA | Poll the status of an async batch job started with replicate_batch_start. Args:
Returns structuredContent: { job_id, overall_status, total, succeeded, failed, running, pending, created_at, expires_at, items: [{ index, model, status, prediction_id, result?, error?, started_at, completed_at }] } overall_status: "running" — predictions still in progress "completed" — all items succeeded "partial" — all done, at least one failed Tip: Poll every 10–30 seconds until overall_status is "completed" or "partial". |
| replicate_list_predictionsA | Return the most recent predictions on the authenticated Replicate account. Useful to recover a prediction ID, audit recent calls, or check what's still running. Args:
Returns structuredContent: { count: number, predictions: PredictionSummary[] } Each PredictionSummary has id, model, status, created_at, completed_at, url. |
| replicate_cancel_predictionA | Cancel an in-progress prediction by its ID. Useful for long-running async jobs (video, large LLM) when the user no longer needs the result. Args:
Returns: PredictionSummary with updated status (typically "canceled"). |
| replicate_estimate_costA | Return an approximate dollar-cost estimate for a planned prediction BEFORE running it. Prices are a hand-curated snapshot — actual billing comes from Replicate. Call this when the user asks "how much would X cost" or before launching a costly model. Args:
Returns structuredContent: { resolved_model_id, num_outputs, duration_seconds, estimated_usd, pricing_basis, note }. Examples:
|
| replicate_refresh_modelsA | Search Replicate for popular models NOT yet in the curated registry. Returns suggestions only — does not modify code. Use this to find new models worth adding. Then ask Claude to edit src/models.ts with the ones you want. Args:
Returns structuredContent: { "checked_at": string, "categories_checked": string[], "suggestions": [{ category, owner, name, model_id, run_count, description, replicate_url }], "already_curated": number, "total_suggestions": number } Examples:
|
| replicate_create_trainingA | Kick off a fine-tuning (training) run on a trainable base model — e.g. a Flux LoRA trainer — with your dataset and hyperparameters. Returns immediately with a training ID; poll it with replicate_get_training. Args:
Returns structuredContent: TrainingSummary { id, status, model, version, destination, created_at, completed_at, output_version, error }. |
| replicate_get_trainingA | Retrieve the current state of a training run: status, the resulting trained model version (once it succeeds), and any error. Args:
Returns structuredContent: TrainingSummary. |
| replicate_list_trainingsA | Return the most recent training runs on the authenticated account. Args:
Returns structuredContent: { count: number, trainings: TrainingSummary[] }. |
| replicate_cancel_trainingA | Cancel an in-progress training run by its ID. Trainings can run for many minutes and cost real money — cancel when no longer needed. Args:
Returns structuredContent: TrainingSummary with the updated status (typically "canceled"). |
| replicate_list_deploymentsA | List the deployments on the authenticated Replicate account. A deployment is a private, autoscaled endpoint pinned to a specific model + hardware. Args:
Returns structuredContent: { count: number, deployments: DeploymentSummary[] }. Each DeploymentSummary has owner, name, and current_release { model, version, hardware, min_instances, max_instances }. |
| replicate_get_deploymentA | Get the configuration of one deployment: its current model + version, hardware, and autoscaling min/max instances. Args:
Returns structuredContent: DeploymentSummary. |
| replicate_run_deploymentA | Run a prediction against a deployment's current release. WAITS for the prediction to finish and (by default) auto-downloads the outputs locally — same UX as the curated generate_* tools. Args:
Returns the standard prediction result (inline image preview / text output, URLs, local_paths, prediction_id). |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
| generate_media | Generate an image, video, audio clip, or speech from a plain-language description. Picks the matching replicate_generate_* tool and a sensible default model. |
| recommend_then_generate | Pick the best Replicate model for a task given a priority (speed, cost, or quality), then run it. Calls replicate_recommend_model and feeds the winner into the right generate tool. |
| batch_generate | Run the same generation over many inputs concurrently. Uses replicate_batch_start with a list of prompts, then polls replicate_batch_status until done. |
| image_to_video_pipeline | Compose a multi-step pipeline that first generates an image, then animates it into a video. Uses replicate_pipeline_start with a two-step DAG and $stepId.field references. |
| transcribe_and_summarize | Transcribe an audio/video file with speech-to-text, then summarise the transcript. Uses replicate_transcribe_audio and returns key points. |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
| model-catalog | Every curated model this server knows, grouped by category, with the Replicate id, a one-line description, and a speed tier. Pass any key (or an 'owner/name[:version]') as a tool's `model` argument. |
| capabilities | A summary of this server: categories covered, model count, transports, and the orchestration features (async batch, DAG pipelines, model recommender, cost estimator). |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sena-labs/replicate-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server