You are building a complete Model Context Protocol (MCP) server that acts as a bridge between web LLM clients and my local resources (local LLMs + local “code nodes”). Deliver a production-ready Node.js/TypeScript project that I can run on my machine and expose to a cloud LLM via a single MCP endpoint.
# Goals
- Provide a single MCP server exposing tools for:
1) Local LLM inference via Ollama (http://localhost:11434) and optionally LM Studio (http://localhost:1234/v1).
2) Local “code nodes” execution: either (a) HTTP POST to a local service, or (b) spawn a shell command safely with a timeout and whitelist.
3) Optional HTTP fetch passthrough for cloud services I host (“Cloud Code”).
- The server must run locally, but be reachable from a browser-based LLM via WebSocket. Include a simple HTTP+WebSocket transport so the web LLM can connect to ws://MYHOST:PORT/mcp.
# Constraints & assumptions
- Use TypeScript + Node 18+.
- Use an MCP server SDK if available (e.g., "modelcontextprotocol" npm package); if the exact package/name differs, resolve and pin the correct one. If no stable SDK, implement JSON-RPC 2.0 with the MCP method shapes (tools/list, tools/call, resources/list, etc.).
- Do not rely on experimental, undocumented APIs without a fallback.
- Provide .env-driven config.
# Project layout
- package.json (with scripts: dev, build, start)
- tsconfig.json
- src/server.ts (entry point; starts HTTP + WebSocket; registers tools)
- src/mcp.ts (MCP plumbing: JSON-RPC handler, tools registry)
- src/tools/ollama.ts (POST to /api/generate or /api/chat per config)
- src/tools/lmstudio.ts (OpenAI-compatible endpoint; requires API base in .env)
- src/tools/codeNode.ts (two modes: HTTP POST to local service; or spawn a whitelisted command)
- src/tools/httpFetch.ts (safe HTTP fetch passthrough with allowlist)
- src/util/logger.ts
- src/util/validators.ts (zod schemas for inputs)
- .env.example
# Tools to expose (MCP "tools")
1) tool: "local_llm.generate"
input schema:
- provider: "ollama" | "lmstudio"
- model: string
- prompt: string
- temperature?: number
- stream?: boolean
behavior:
- If provider=ollama: call http://localhost:11434 with the correct route (support both /api/generate and /api/chat based on a boolean in .env OLLAMA_USE_CHAT).
- If provider=lmstudio: call LM Studio OpenAI-compatible /v1/chat/completions or /v1/completions (pick one; document in README). Use optional LMSTUDIO_API_KEY from .env; handle no-key case gracefully (most local setups don’t need it).
- Return { text, tokens, model, latencyMs }.
2) tool: "code_node.exec_http"
input schema:
- url: string (must match ALLOWLIST_URLS regex or be in ALLOWLIST_HOSTS)
- method?: "POST"|"GET" (default POST)
- headers?: record<string,string>
- body?: any
- timeoutMs?: number (default 30000)
behavior:
- Perform fetch with timeout, return { status, headers, body }.
- Reject if URL is not allowed by allowlist.
3) tool: "code_node.exec_local"
input schema:
- cmd: string (must match allowed binaries list)
- args?: string[]
- cwd?: string
- timeoutMs?: number (default 15000)
behavior:
- Spawn using child_process with a safe, explicit ALLOWED_BINARIES list from .env (e.g., python, node, bash, my-tool).
- Kill on timeout. Return { exitCode, stdout, stderr, durationMs }.
4) tool: "cloud_http.fetch" (optional passthrough to my own Cloud Code)
input schema like exec_http but with a different allowlist.
# Transport
- Expose HTTP GET /healthz that returns 200.
- Expose WebSocket at /mcp that implements MCP JSON-RPC 2.0 framing. Include basic ping/pong keepalive.
- Log minimal connection info and tool invocations (scrub secrets).
# Security & safety
- Never allow arbitrary file reads/writes.
- Enforce allowlists:
- ALLOWLIST_HOSTS and/or ALLOWLIST_URLS for HTTP tools.
- ALLOWED_BINARIES for local exec.
- Enforce payload size limits (e.g., 5 MB).
- CORS: allow origins via ORIGIN_ALLOWLIST (comma-separated).
- Add rate limiting (simple token bucket per connection).
# Configuration (.env.example)
PORT=8765
HOST=127.0.0.1
ORIGIN_ALLOWLIST=http://localhost:3000,https://my-web-llm.example
ALLOWLIST_HOSTS=localhost,127.0.0.1
ALLOWLIST_URLS=
ALLOWED_BINARIES=python,node
OLLAMA_BASE=http://127.0.0.1:11434
OLLAMA_USE_CHAT=false
LMSTUDIO_BASE=http://127.0.0.1:1234/v1
LMSTUDIO_API_KEY=
MAX_BODY_BYTES=5242880
# Implementation notes
- Prefer fastify or express + ws for transport. Use zod for input validation. Use node-fetch or undici for HTTP.
- For MCP: implement methods:
- "tools/list": returns the four tools above with JSON Schemas.
- "tools/call": executes a tool by name with validated args.
- "session/ready" (if needed by SDK) and simple "ping".
- Include a minimal MCP README that documents how a web LLM client should connect (ws://HOST:PORT/mcp) and the exact payload shapes for tools/list and tools/call.
- Provide typed response shapes and good errors.
# Developer UX
- README.md with:
- “Quick start” (npm i, npm run dev).
- Example WebSocket client snippet (browser) that calls tools/list then tools/call for local_llm.generate with Ollama and LM Studio.
- curl examples for healthz and a demo tools/call over WebSocket (node script).
- Troubleshooting: CORS, allowlists, timeouts, common 4xx/5xx.
# Acceptance tests (must pass)
- "npm run dev" starts on PORT from .env and prints ws://HOST:PORT/mcp ready.
- GET /healthz returns 200.
- Connecting via WebSocket and sending tools/list returns the four tools with schemas.
- Calling local_llm.generate with provider=ollama, model=llama3, prompt="hello" returns text.
- Calling code_node.exec_local with cmd=python and args=["-V"] returns exitCode 0 and stdout containing "Python".
- Blocking test: code_node.exec_local with cmd=rm should return 403 explaining not in ALLOWED_BINARIES.
- Blocking test: code_node.exec_http to disallowed host returns 403.
# Deliverables
- Complete repo with the files listed.
- Fully typed TS code.
- No TODOs left; runnable out of the box.