Skip to main content
Glama

Chat with an LLM via Replicate

replicate_chat

Run large language models from Replicate for text generation, Q&A, code writing, summarization, translation, and more. Tune output with customizable model, temperature, system prompt, and generation limits.

Instructions

Run a large language model hosted on Replicate. Use this for free-form text generation, Q&A, code writing, summarisation, translation — anything where the input is text and the output is text.

Args:

  • prompt (string): User message.

  • model (string, default "llama-3-70b"): Curated key (llama-3.1-405b, llama-3-70b, llama-3-8b, mistral-7b, mixtral-8x7b, deepseek-r1) or "owner/name".

  • system_prompt (string, optional): Persona / instructions.

  • max_tokens (1-8192, optional): Generation limit.

  • temperature (0-2, optional): Sampling temperature.

  • extra_input (object, optional): Model-specific extras (top_p, top_k, frequency_penalty, etc.).

  • download (boolean, default false): No file outputs; leave false.

  • timeout_ms (5000-1800000, optional): Default 300000.

Returns: PredictionResult with text_output[0] containing the model's reply (later entries are raw streamed segments if applicable).

Examples:

  • prompt="Explain quantum entanglement in two sentences.", model="llama-3-70b"

  • prompt="Write a Python function to compute Levenshtein distance.", model="mistral-large", system_prompt="You are an expert software engineer."

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modelNoLLM identifier. Curated keys: llama-3.1-405b, llama-3-70b, llama-3-8b, mistral-7b, mixtral-8x7b, deepseek-r1. Or full Replicate "owner/name[:version]".llama-3-70b
promptYesUser message / prompt for the LLM.
downloadNoLLM output is text — default false (no file to download).
max_tokensNoMax tokens to generate. Default model-dependent.
timeout_msNoMax ms to wait for the prediction. If exceeded, returns the prediction ID so you can poll via replicate_get_prediction. Default: 300000 (5min).
extra_inputNoAdditional model-specific inputs.
temperatureNoSampling temperature 0.0–2.0. Lower = more deterministic.
system_promptNoOptional system prompt to set persona / instructions.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-readonly, non-idempotent, non-destructive. The description adds the crucial timeout behavior (polling via replicate_get_prediction), explains the download parameter is irrelevant for text, and describes the return format. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured: a short purpose sentence, a list of use cases, a clear Args section with inline notes, a Returns section, and concrete examples. Every sentence adds distinct value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description sufficiently explains the return value (text_output[0]) and polling behavior when timeout is exceeded. All 8 parameters are covered with examples, making the tool fully understandable for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description enriches every parameter: explains model curated keys vs custom format, clarifies download default and reason, details timeout default and polling fallback, provides examples for prompt and system_prompt usage. This goes well beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Run' with a clear resource 'large language model', and enumerates diverse text-generation use cases (Q&A, code, summarisation, translation). It inherently distinguishes from sibling tools that generate images, audio, or video.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for text-in/text-out tasks and contrasts with multimodal siblings. However, it does not explicitly state when not to use this tool or mention alternative tools for specific sub-tasks like chat or code generation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sena-labs/replicate-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server