Chat with an LLM via Replicate
replicate_chatRun large language models from Replicate for text generation, Q&A, code writing, summarization, translation, and more. Tune output with customizable model, temperature, system prompt, and generation limits.
Instructions
Run a large language model hosted on Replicate. Use this for free-form text generation, Q&A, code writing, summarisation, translation — anything where the input is text and the output is text.
Args:
prompt (string): User message.
model (string, default "llama-3-70b"): Curated key (llama-3.1-405b, llama-3-70b, llama-3-8b, mistral-7b, mixtral-8x7b, deepseek-r1) or "owner/name".
system_prompt (string, optional): Persona / instructions.
max_tokens (1-8192, optional): Generation limit.
temperature (0-2, optional): Sampling temperature.
extra_input (object, optional): Model-specific extras (top_p, top_k, frequency_penalty, etc.).
download (boolean, default false): No file outputs; leave false.
timeout_ms (5000-1800000, optional): Default 300000.
Returns: PredictionResult with text_output[0] containing the model's reply (later entries are raw streamed segments if applicable).
Examples:
prompt="Explain quantum entanglement in two sentences.", model="llama-3-70b"
prompt="Write a Python function to compute Levenshtein distance.", model="mistral-large", system_prompt="You are an expert software engineer."
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | LLM identifier. Curated keys: llama-3.1-405b, llama-3-70b, llama-3-8b, mistral-7b, mixtral-8x7b, deepseek-r1. Or full Replicate "owner/name[:version]". | llama-3-70b |
| prompt | Yes | User message / prompt for the LLM. | |
| download | No | LLM output is text — default false (no file to download). | |
| max_tokens | No | Max tokens to generate. Default model-dependent. | |
| timeout_ms | No | Max ms to wait for the prediction. If exceeded, returns the prediction ID so you can poll via replicate_get_prediction. Default: 300000 (5min). | |
| extra_input | No | Additional model-specific inputs. | |
| temperature | No | Sampling temperature 0.0–2.0. Lower = more deterministic. | |
| system_prompt | No | Optional system prompt to set persona / instructions. |