local_llm_run
Send prompts to a local LLM for private, free text processing. Perform tasks like classification, formatting, extraction, rewriting, and proofreading without an API key.
Instructions
Send a prompt to a local LLM (LM Studio, Ollama, llama.cpp, or any OpenAI-compatible server). Free, private, no API key needed. Best for simple tasks: classify, format, extract, rewrite, proofread. Set stream=true for token-by-token progress.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model | No | Model identifier as shown in LM Studio / Ollama (e.g. "deepseek-r1-0528-qwen3-8b", "phi-4-mini"). Omit to use the server's currently loaded model or LOCAL_LLM_MODEL env var. | |
| prompt | Yes | Prompt or question to send to the local LLM. | |
| stream | No | Stream response token-by-token via MCP progress notifications. The client sees partial content in real time. Final result is still returned as a complete response. | |
| system | No | Optional system message to set the LLM's behavior. | |
| endpoint | No | Override the local LLM endpoint URL (e.g. "http://localhost:11434/v1" for Ollama). Omit to use LOCAL_LLM_ENDPOINT env var or default (http://localhost:1234/v1 for LM Studio). | |
| max_tokens | No | Maximum tokens to generate. Default: server default. | |
| temperature | No | Sampling temperature (0 = deterministic, higher = more creative). Default: server default. | |
| timeout_seconds | No | Max seconds to wait for a response. |