RanchHand is an OpenAI-compatible MCP server that acts as a proxy to AI backends like Ollama, providing standardized API access through multiple tools:
Core Capabilities:
List Available Models: Query all models from the backend using
openai_models_list
viaGET /v1/models
Chat Completions: Generate conversational responses with
openai_chat_completions
viaPOST /v1/chat/completions
, supporting parameters like temperature, top_p, and max_tokens (streaming not yet implemented)Create Embeddings: Generate vector embeddings from text using
openai_embeddings_create
viaPOST /v1/embeddings
HTTP Ingest Service: Optional service on localhost:41414 for ingesting data like Slack messages via
POST /ingest/slack
with authentication (currently a stub)
Configuration & Integration:
Backend Flexibility: Works with any OpenAI-compatible backend (defaults to Ollama at localhost:11434/v1)
Environment Variables: Configurable via
OAI_BASE
,OAI_API_KEY
,OAI_DEFAULT_MODEL
, andOAI_TIMEOUT_MS
MCP Integration: Can run standalone or integrate into MCP configurations for Claude/Codex integration
Provides access to locally hosted AI models through Ollama's OpenAI-compatible endpoints, enabling chat completions, embeddings generation, and model listing capabilities
Offers integration with OpenAI-compatible APIs for chat completions, embeddings creation, and model management through standardized OpenAI endpoints
RanchHand — OpenAI-compatible MCP Server
RanchHand is a minimal MCP server that fronts an OpenAI-style API. It works great with Ollama's OpenAI-compatible endpoints (http://localhost:11434/v1) and should work with other OpenAI-compatible backends.
Features
Tools:
openai_models_list
→ GET/v1/models
openai_chat_completions
→ POST/v1/chat/completions
openai_embeddings_create
→ POST/v1/embeddings
Optional HTTP ingest on localhost:41414 (bind 127.0.0.1):
POST /ingest/slack
Config via env:
OAI_BASE
(defaulthttp://localhost:11434/v1
)OAI_API_KEY
(optional; some backends ignore it, Ollama allows any value)OAI_DEFAULT_MODEL
(fallback model name, e.g.llama3:latest
)OAI_TIMEOUT_MS
(optional request timeout)
Run (standalone)
HTTP Ingest Service
Example request:
MCP Tools
openai_models_list
Input:
{}
Output: OpenAI-shaped
{ data: [{ id, object, ... }] }
openai_chat_completions
Input:
{ model?: string, messages: [{ role: 'user'|'system'|'assistant', content: string }], temperature?, top_p?, max_tokens? }
Output: OpenAI-shaped chat completion response (single-shot; streaming TBD)
openai_embeddings_create
Input:
{ model?: string, input: string | string[] }
Output: OpenAI-shaped embeddings response
Claude/Codex (MCP)
Point your MCP config to:
Notes
Streaming chat completions are not implemented yet (single response per call). If your backend requires streaming, we can add an incremental content pattern that MCP clients can consume.
RanchHand passes through OpenAI-style payloads and shapes outputs to be OpenAI-compatible, but exact metadata (usage, token counts) depends on the backend.
HTTP ingest is currently an acknowledgment stub (counts + sample). Chunking/embedding/upsert will be wired next; design is pluggable for local store or Qdrant.
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables interaction with OpenAI-compatible APIs (like Ollama) through MCP tools. Provides access to chat completions, model listings, and embeddings generation from local or remote OpenAI-style endpoints.