woollama
Allows interaction with Git repositories through MCP tools, enabling version control operations such as file management and commit history.
Provides integration with local Ollama models, allowing AI agents to run inference on locally hosted models via the Ollama backend.
Provides compatibility with OpenAI's API, allowing any OpenAI client to route requests through woollama, and also supports OpenAI as a backend provider.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@woollamaPlease count to 4."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
woollama
Web Over Ollama (and Llamas). An MCP + OpenAI router for AI desktops.
π Documentation: woollama.readthedocs.io
woollama sits between AI clients (Cursor, the OpenAI SDK, Claude Desktop, cosmic-fabric, anything that speaks OpenAI or MCP) and AI backends (Ollama, Anthropic, fabric, lackpy, filesystem MCPs, anything that speaks OpenAI or MCP). It composes them into orchestrated calls without inventing a new protocol.
βββββββββββββββββββββββ
β AI clients β
β (any OpenAI or β
β MCP client) β
ββββββββββββ¬βββββββββββ
β
ββββββββββββββββββββ΄ββββββββββββββββββββ
β woollama β
β OpenAI server + MCP server β
β βββββββββββββββββββββββββββββββ β
β routes models, tools, executors β
β composes patterns + tools + models β
β into named recipes β
ββββββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββββββββ΄ββββββββββββββββββββ
β β
βββββ΄βββββ ββββββ΄βββββ
β MCP β tools, prompts, resources β OpenAI β inference
β tool β β compat β
β serversβ β backendsβ
ββββββββββ βββββββββββ
fabric-mcp, lackpy, Ollama, Anthropic,
filesystem, git, β¦ vLLM, llama.cpp, β¦Status
The Rust daemon woollamad β a multi-backend router, both surfaces live,
published to crates.io + PyPI. woollama works end-to-end as:
an OpenAI-compatible server:
/v1/chat/completions(pass-through and hidden chat-loop orchestration of recipes, both withstream:trueβ OpenAI SSE),/v1/models,/v1/tools, and a stateful surface β/v1/responses+/v1/conversations(OpenAI Responses/Conversations shape; see below);an MCP server to its own clients β over stdio (
woollamad mcp) and over Streamable HTTP at/mcp, mounted on the same port as/v1/*. It re-exports every discovered downstream tool (namespaced, withoutput_schema) plus achatverb that emits live tool-progress notifications β i.e. it's an MCP aggregator.
It routes inference across multiple backends by <provider>/<model> β
ollama (local), anthropic, openai, groq, together, openrouter, and
any OpenAI-compatible endpoint you add in inferencers.toml (e.g.
self-hosted vLLM) β plus claude-code/<model>, a keyless path to Claude via the
local CLI (tool-less, or as an executor that runs a recipe's allow-listed
MCP tools itself β tool delegation). Config is file-driven (mcp.json,
recipes.toml, inferencers.toml).
Stateful conversations route handles; backends own the state β woollama
never stores transcripts in its own system. Two state-owning backends:
claude-resume (claude --resume, for claude-code models; keyless, the Claude
session owns the bytes) and managed-agents (Anthropic's Managed Agents, for
claude-agent models; ANTHROPIC_API_KEY, Anthropic hosts the session β and
exposes the transcript, so /v1/conversations/{id}/items works). Models with no
state-owning backend (ollama/cloud/recipe) are stateless β the caller owns
history (store:false). Long-lived MCP
connections. Served on both a Unix socket ($XDG_RUNTIME_DIR/woollama.sock,
mode 0600 β the default for local MCP clients) and an ephemeral loopback TCP
port; never 0.0.0.0 without explicit opt-in.
Current status and what's next live in
docs/roadmap.md.
The Rust port is done (v0.5.x).
woollamadis the canonical router, published to crates.io (cargo install woollama-server) and PyPI (pip install woollama). The Python insrc/woollama/is kept as the reference server and differential-test oracle β not deleted. Seedocs/rust-transition.mdfor the (completed) transition criteria.
See docs/architecture.md for the full target design and
docs/build-log.md for the slice-by-slice history.
Related MCP server: Llama Maverick Hub MCP Server
Quick taste
The router is OpenAI-compatible, so any OpenAI client can drive it:
import openai
c = openai.OpenAI(base_url="http://127.0.0.1:<port>/v1", api_key="x")
# Pass-through to Ollama
r = c.chat.completions.create(
model="ollama/qwen3:14b-iq4xs",
messages=[{"role": "user", "content": "Hi"}],
)
# Orchestrated: a recipe (system prompt + tools + model), transparent to the
# client. The chat-loop happens inside woollama; client sees only the final answer.
r = c.chat.completions.create(
model="woollama/streamer",
messages=[{"role": "user", "content": "Please count to 4."}],
)woollama serves on two transports at once: a Unix socket at
$XDG_RUNTIME_DIR/woollama.sock (mode 0600 β the default for local MCP clients,
since a connectable socket can spend the router's API keys) and an ephemeral
loopback TCP port written to $XDG_RUNTIME_DIR/woollama.addr for clients to
discover. The <port> above is that ephemeral port. Same pattern as a local
fabric --serve instance.
Install
The router is woollamad β a small Rust daemon. The Python implementation is
kept as a reference server and the differential-test oracle (see below), but
woollamad is the canonical router.
From crates.io (once published β cargo install ships only the binary, so
bring your own mcp.json):
cargo install woollama-server # installs the `woollamad` binary
woollamad # starts the router; prints its addressFrom this checkout (works today; includes the bundled example MCP servers):
git clone https://github.com/teaguesterling/woollama
cd woollama
cargo build --release # builds target/release/woollamad
./target/release/woollamad # starts the router; prints its addressOn startup woollamad prints its OpenAI base_url (e.g.
http://127.0.0.1:<port>/v1) β copy that into your OpenAI client. (It's also
written to $XDG_RUNTIME_DIR/woollama.addr for programmatic discovery, and it
serves the same surface over the woollama.sock unix socket.)
The Python reference server
The original Python implementation still runs and is used as the live oracle that
keeps woollamad honest:
uv sync # creates .venv and installs deps
uv run woollama # the Python reference serverPrerequisite for the examples below: they use
ollama/qwen3:14b-iq4xs, so install Ollama,ollama serve, andollama pull qwen3:14b-iq4xs. No Ollama? Use the keyless Claude path instead βmodel="claude-code/haiku"(needs theclaudeCLI logged in) β or any cloud model with its key set (see Configuration).
Tests & lint
# Rust (woollamad): the daemon's own suites
cargo test --tests --features test-fixtures
cargo build --release # so the live oracle can spawn the binary
# Python: hermetic suite + lint
uv run --extra dev pytest # hermetic suite (live tests are opt-in: -m integration)
uv run ruff check . # lint β the CI gate
# The live differential oracle β same tests, against woollamad by default:
uv run --extra dev pytest -m integration # targets target/release/woollamad
WOOLLAMA_TEST_CMD="python -m woollama" \
uv run --extra dev pytest -m integration # opt in to the Python referenceCI (.github/workflows/ci.yml) runs the Rust + Python gates on every push to main and PR.
For the same lint gate locally on commit, opt into the pre-commit hook:
uv tool install pre-commit && pre-commit installLint only β the project does not use ruff format (lines are hand-wrapped,
E501 is ignored), so there is no formatter step in either gate.
Design principles
Two standards, neither extended. MCP for tool/prompt/resource discovery and execution; OpenAI chat-completions for the inference primitive. woollama is a router between them.
Local-only, ephemeral by default. Random loopback port, persisted address file for discovery, never
0.0.0.0without explicit opt-in. The router holds API keys and routes to local resources β it should not be LAN-reachable.The model namespace is the universal addressing scheme. Raw inferencers (
<provider>/<model>, e.g.ollama/X,anthropic/X,claude-code/X) and full recipes (woollama/<recipe>) are all addressable through OpenAI's standardmodelfield. No new wire format.woollama owns routing, not inference or tools. It uses other people's inference engines (Ollama, Anthropic, β¦) and other people's tool servers (any MCP server β filesystem, git, lackpy, β¦). It composes them.
she talks to llamas.
What works today
OpenAI surface:
/v1/models,/v1/chat/completions(pass-through + recipe orchestration, both withstream:trueβ OpenAI SSE),/v1/toolsintrospectionStateful surface:
/v1/responses(stateless subset, incl.stream:trueβ OpenAI Responses SSE, + stateful) and/v1/conversations(create/list/get/ delete, plusitemswhere the backend exposes its transcript). woollama routes conversation handles; backends own state (woollama never stores transcripts itself) βclaude-resumeforclaude-codemodels,managed-agents(Anthropic Managed Agents) forclaude-agentmodels, with an interactiverequires_actionpause/answer path; models with no state-owning backend are stateless (store:false)Multi-backend routing by
<provider>/<model>: ollama (incl.num_ctxhonored via ollama's native/api/chat), anthropic, openai, groq, together, openrouter,claude-code, + any OpenAI-compatible endpoint viainferencers.tomlTool delegation: a
claude-coderecipe with tools runs as an executor β Claude owns the agentic loop and calls the recipe's allow-listed MCP tools itself (per-recipe--mcp-config+--allowedToolscontainment)MCP server side: stdio (
woollamad mcp) and Streamable HTTP at/mcpon the same port β recipes as parameterized prompts (their{{var}}tokens β arguments), achatverb (with live tool-progress notifications), and every downstream tool re-exported with itsoutput_schema(aggregator)Pattern templating on woollama's own
/w1/namespace (not OpenAI's/v1/): parameterized recipes/patterns with{{var}}substitution βGET /w1/patterns(discovery),POST /w1/patterns/{name}/render(assemble),POST /w1/patterns/{name}/run(render + infer, streaming). Patterns also come from a fabric-style directory scan ([patterns]) and a fabric backend: woollama can run/ownfabric --serve, surface its library on/w1/, and transparently proxy fabric's API at/fabric/*. Pattern backends are pluggable (thePatternBackendtrait β seedocs/extending.md)File-driven config (
mcp.json,recipes.toml,inferencers.toml), multi- MCP-server discovery + unified tool registry, long-lived MCP connectionsRecipe allow-list enforced as a security boundary (in-loop AND in delegation); served on a Unix socket + loopback TCP, address discovery file; CI (ruff + hermetic suite, 3.11/3.12)
Not yet (next on the roadmap)
The live, interactive Claude-in-tmux session backend (a separate Rust session driver) β gated on spikes that need a real terminal. (The interactive
requires_actionpath itself already works via the managed-agents backend.)cosmic-fabric actually consuming the conversations surface (the last open integration milestone). The generic
store-backedmechanism + two reference store providers (MCP + REST) already ship; what's pending is the cross-repo wiring. (Pattern templating + the fabric backend it needed have shipped β seedocs/patterns.md.)lackpy re-pinning to the now-published
woollama-corewheel.
Full scorecard, ordering, and pending verifications:
docs/roadmap.md.
Origin
woollama is the production-grade rewrite of an architecture co-designed in cosmic-fabric, which remains a frontend (and will use woollama as its router engine). The design docs that brought woollama here:
docs/architecture.mdβ the model/tool/executor router designdocs/naming.mdβ how we landed on this name
License
MIT β see LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/teaguesterling/woollama'
If you have feedback or need assistance with the MCP directory API, please join our Discord server