ollama-handoff
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| OLLAMA_URL | No | Base URL of the Ollama server | http://localhost:11434 |
| OLLAMA_NUM_CTX | No | Context window in tokens | 32768 |
| OLLAMA_TIMEOUT_S | No | Per-request timeout, seconds | 600 |
| OLLAMA_KEEP_ALIVE | No | How long to keep the model resident in VRAM | 30m |
| OLLAMA_DEFAULT_MODEL | No | Default model for handoffs | qwen2.5-coder:14b |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| ask_localA | Send a one-shot prompt to a local Ollama model and return the response. Use for any handoff where the cloud model's full reasoning isn't needed: drafts, boilerplate, simple extractions, formatting, quick lookups. Runs on the user's own GPU and consumes no cloud-LLM usage. Args: prompt: The task / question. model: Override the default model. system: Optional system prompt to shape behavior. |
| chat_localA | Multi-turn chat against a local Ollama model. Use when the handoff needs more than one turn of context. |
| summarize_localB | Summarize a block of text using the local model. Cheap offload for long files, logs, transcripts, or docs the cloud model doesn't need to fully ingest. Returns a concise structured summary. Args: text: The content to summarize. Can be very long (context window is configurable). focus: Optional focus hint, e.g. "errors and stack traces" or "API surface only". |
| code_review_localA | Quick first-pass code review using the local coder model. Catches obvious bugs, style issues, and risky patterns. Use as a cheap pre-filter before asking the cloud model for a deeper review. Args: diff_or_code: A unified diff or a code block to review. |
| draft_commit_message_localA | Draft a conventional-style commit message from a diff using the local model. Cheap and fast — good for routine commits where the cloud model's analysis isn't needed. Args:
diff: The output of |
| extract_localB | Extract specific information from a text block using the local model. Good for pulling structured data out of unstructured text — function names, URLs, error messages, TODO comments, etc. Args: text: The source text. what_to_extract: What to pull out, e.g. "all function definitions" or "every URL in the file". |
| list_modelsA | List the Ollama models available locally. |
| server_infoA | Return the server's effective configuration (model, context size, etc.). |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Michael-WhiteCapData/ollama-handoff'
If you have feedback or need assistance with the MCP directory API, please join our Discord server