RanchHand is an OpenAI-compatible MCP server that acts as a proxy to AI backends like Ollama, providing standardized API access through multiple tools:
Core Capabilities:
List Available Models: Query all models from the backend using
openai_models_listviaGET /v1/modelsChat Completions: Generate conversational responses with
openai_chat_completionsviaPOST /v1/chat/completions, supporting parameters like temperature, top_p, and max_tokens (streaming not yet implemented)Create Embeddings: Generate vector embeddings from text using
openai_embeddings_createviaPOST /v1/embeddingsHTTP Ingest Service: Optional service on localhost:41414 for ingesting data like Slack messages via
POST /ingest/slackwith authentication (currently a stub)
Configuration & Integration:
Backend Flexibility: Works with any OpenAI-compatible backend (defaults to Ollama at localhost:11434/v1)
Environment Variables: Configurable via
OAI_BASE,OAI_API_KEY,OAI_DEFAULT_MODEL, andOAI_TIMEOUT_MSMCP Integration: Can run standalone or integrate into MCP configurations for Claude/Codex integration
Provides access to locally hosted AI models through Ollama's OpenAI-compatible endpoints, enabling chat completions, embeddings generation, and model listing capabilities
Offers integration with OpenAI-compatible APIs for chat completions, embeddings creation, and model management through standardized OpenAI endpoints
RanchHand — OpenAI-compatible MCP Server (Architecture)
RanchHand is a minimal MCP server that fronts an OpenAI-style API. It works great with Ollama's OpenAI-compatible endpoints (http://localhost:11434/v1) and should work with other OpenAI-compatible backends.
Features
Tools:
openai_models_list→ GET/v1/modelsopenai_chat_completions→ POST/v1/chat/completionsopenai_embeddings_create→ POST/v1/embeddingsOptional HTTP ingest on localhost:41414 (bind 127.0.0.1):
POST /ingest/slack(index: chunk + embed + upsert in in-memory store)POST /query(kNN query with embeddings)GET /profiles|POST /profiles(role defaults: embed, summarizers, reranker, chunking)POST /answer(retrieve + generate answer with bracketed citations)
Config via env:
OAI_BASE(defaulthttp://localhost:11434/v1)OAI_API_KEY(optional; some backends ignore it, Ollama allows any value)OAI_DEFAULT_MODEL(fallback model name, e.g.llama3:latest)OAI_TIMEOUT_MS(optional request timeout)
Development
Linting
This project uses ESLint to maintain code quality and consistency.
The linting rules enforce:
Consistent code style (single quotes, semicolons, 2-space indentation)
Error prevention (no unused variables, no undefined variables)
Modern JavaScript practices (const/let instead of var, arrow functions)
CI will automatically run linting checks on all pull requests.
Testing
This repo uses Vitest for unit tests. External network calls are mocked, so tests run deterministically without Ollama or internet access.
Commands:
Coverage thresholds are configured in vitest.config.mjs (initial targets):
Lines/Statements ≥ 60%
Functions ≥ 55%
Branches ≥ 50%
These thresholds indicate the minimum proportion of code exercised by tests. They are a guardrail, not a guarantee of correctness. We can raise them as the test suite grows.
Notes:
Tests live in tests/**/*.test.js
Use vi.spyOn/vi.mock to stub fetch and other external calls
For CI stability, avoid real network calls in tests
Run (standalone)
HTTP Ingest Service
Example request:
Query:
Answer with citations:
Profiles:
MCP Tools
openai_models_listInput:
{}Output: OpenAI-shaped
{ data: [{ id, object, ... }] }
openai_chat_completionsInput:
{ model?: string, messages: [{ role: 'user'|'system'|'assistant', content: string }], temperature?, top_p?, max_tokens? }Output: OpenAI-shaped chat completion response (single-shot; streaming TBD)
openai_embeddings_createInput:
{ model?: string, input: string | string[] }Output: OpenAI-shaped embeddings response
Claude/Codex (MCP)
Point your MCP config to:
Notes
Streaming chat completions are not implemented yet (single response per call). If your backend requires streaming, we can add an incremental content pattern that MCP clients can consume.
RanchHand passes through OpenAI-style payloads and shapes outputs to be OpenAI-compatible, but exact metadata (usage, token counts) depends on the backend.
HTTP ingest is currently an acknowledgment stub (counts + sample). Chunking/embedding/upsert will be wired next; design is pluggable for local store or Qdrant.
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables interaction with OpenAI-compatible APIs (like Ollama) through MCP tools. Provides access to chat completions, model listings, and embeddings generation from local or remote OpenAI-style endpoints.