Which integrations are available for this server?

Allows managing local Ollama models: discover installed and loaded models, pull/remove models, load/unload from memory, check hardware fit, and offload inference (completion and embedding) to local Ollama instances.

How do I use Local AI MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Local AI MCP list installed models" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Local AI MCP

Official

by TMHSDigital

Overview Schema Related Servers Score Discussions

TypeScript

Local

Local AI MCP

Unified MCP server for managing local model runtimes (Ollama, LM Studio, and more): provider-agnostic discovery, lifecycle, hardware-fit, and delegated inference.

License: CC-BY-NC-ND-4.0 Version Type

Local AI MCP is an MCP server that turns your local model runtimes into an agent-callable control plane. It is operations-first: its primary job is to discover, inspect, fit, and manage the models running on your own machine. It speaks to runtimes over their local HTTP APIs and exposes one consistent tool surface across them, so an agent does not need to know whether a model lives in Ollama or LM Studio. The server is local-first: local runtimes are the primary target, with optional hosted providers (such as Moonshot AI) available behind the same tool surface when you configure an API key.

The server communicates over stdio only. It is a client to your local runtimes and never opens a network listener of its own.

Why an ops-first local-model server

Discovery and lifecycle, not just chat. List what is installed, what is loaded, pull and remove models, load and unload them, and check their fit against your hardware before you commit VRAM to them.
Hardware-aware. system_resources and fit_check read your real RAM and GPU/VRAM so an agent can pick a model that will actually run, and suggest_model ranks candidates by task and by what fits.
Provider-agnostic. Every tool takes an optional provider argument. Omit it and the tool operates across all detected runtimes, aggregating results per provider.

Related MCP server: 1mcpserver

Inference is delegation, not chat

The complete and embed tools exist to delegate (offload) inference to a model you choose for cost control and privacy: by default that means keeping tokens and data on your own hardware, with hosted providers as an explicit opt-in. They are deliberately framed as delegated/offloaded inference primitives, not as a conversational chat surface.

The provider-adapter model

Each runtime is implemented as an adapter behind a single Provider interface (src/providers/types.ts) with a uniform method set: detect, health, listModels, listLoaded, modelInfo, pull, remove, load, unload, complete, embed, and capabilities. Adding a runtime means adding one adapter; the tool layer is unchanged.

Adapter	Default host	Transport	Notes
Ollama (`src/providers/ollama.ts`)	`http://localhost:11434`	Native REST + OpenAI-compatible	`load`/`unload` map to Ollama `keep_alive` semantics (`keep_alive` to load, `keep_alive: 0` to unload). `complete`/`embed` use the OpenAI-compatible `/v1` routes.
LM Studio (`src/providers/lmstudio.ts`)	`http://localhost:1234`	REST (`/api/v0`) + OpenAI-compatible	Uses the `lms` CLI for `load`/`unload`/`pull`/`remove` when present; falls back to REST for `listModels`/`listLoaded`/`complete`/`embed`.
llama.cpp (`src/providers/llamacpp.ts`)	`http://localhost:8080`	Native `/health` `/props` `/slots` + OpenAI `/v1`	Model is loaded at server start; no pull/load/unload. Slot introspection via `/slots`.
OpenAI-compat (`src/providers/openaicompat.ts`)	(unset)	OpenAI-compatible `/v1`	Opt-in via `OPENAI_COMPAT_HOST` (vLLM, Jan, etc.). Inference only.
Moonshot AI (Kimi) (`src/providers/moonshot.ts`)	`https://api.moonshot.ai/v1`	Hosted OpenAI-compatible	Requires `MOONSHOT_API_KEY` (Bearer auth); not detected without it. `complete` and `listModels` only; lifecycle (`pull`/`remove`/`load`/`unload`) and `embed` are unsupported for the hosted API. Flagship model: `kimi-k3`.

Auto-detection: on each call the server probes the configured endpoints to determine which providers are live (hosted providers require their API key to be set). Hardware probing is isolated in src/hardware/ and branches by platform (Windows / Linux); it exposes total/free RAM and, where detectable, GPU name and VRAM.

Tool surface (16 tools)

Discovery

Tool	Description
`list_providers`	Configured runtimes, their host, live/detected status, and capabilities.
`list_models`	Installed models across detected providers (or one provider).
`list_loaded`	Models currently resident in memory.
`model_info`	Detailed metadata for a model.

Lifecycle

Tool	Description
`pull_model`	Download a model. Heavy: may transfer multiple GB.
`remove_model`	Delete a model from disk. Destructive: requires `confirm: true` and a `provider` (no fan-out); refuses without `confirm: true`.
`load_model`	Load a model into memory (Ollama `keep_alive`; LM Studio `lms load`).
`unload_model`	Evict a model from memory.

Ops

Tool	Description
`health_check`	Liveness and version per provider.
`system_resources`	Total/free RAM, CPU count, and GPU/VRAM.
`fit_check`	Whether weight + KV-cache estimate fits in free VRAM (GPU) or RAM (CPU).
`benchmark`	Measure latency and tokens/sec with one small completion. Heavy: runs real inference.

Registry

Tool	Description
`search_available`	Search a curated catalog of well-known models (Ollama library oriented).
`suggest_model`	Recommend a model for a task, ranked by what fits your detected hardware.

Delegation (offloaded inference)

Tool	Description
`complete`	Delegate a completion (streams via MCP progress when the client sends a progressToken).
`embed`	Delegate embedding generation to a local model.

Every tool except system_resources accepts an optional provider (ollama | lmstudio | llamacpp | openaicompat | moonshot). Omit it to operate across all detected runtimes.

Install and run

npx @tmhs/local-ai-mcp

Claude Desktop / Cursor config

{
  "mcpServers": {
    "local-ai": {
      "command": "npx",
      "args": ["-y", "@tmhs/local-ai-mcp"],
      "env": {
        "OLLAMA_HOST": "http://localhost:11434",
        "LMSTUDIO_HOST": "http://localhost:1234",
        "MOONSHOT_API_KEY": "your-moonshot-api-key"
      }
    }
  }
}

Configuration

All configuration is via environment variables with sane defaults:

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama base URL (scheme optional; added if missing).
`LMSTUDIO_HOST`	`http://localhost:1234`	LM Studio base URL.
`LLAMACPP_HOST`	`http://localhost:8080`	llama.cpp server base URL.
`OPENAI_COMPAT_HOST`	(unset)	Generic OpenAI-compatible `/v1` base URL (vLLM, Jan, …). Provider omitted when unset.
`OPENAI_COMPAT_API_KEY`	(unset)	Optional Bearer token for the OpenAI-compat adapter.
`MOONSHOT_HOST`	`https://api.moonshot.ai/v1`	Moonshot AI base URL (include the `/v1` path).
`MOONSHOT_API_KEY`	(unset)	Moonshot AI API key (Bearer token). The provider is skipped when unset.
`LOCAL_AI_REQUEST_TIMEOUT_MS`	`120000`	Timeout for normal requests (inference, pull progress, etc.).
`LOCAL_AI_DETECT_TIMEOUT_MS`	`1500`	Timeout for provider auto-detection probes.
`LOCAL_AI_PULL_TIMEOUT_MS`	`3600000`	Timeout for model pulls (multi-GB downloads); set `0` to disable.

Development

npm install
npm run build      # tsc -> dist/
npm test           # vitest; runs fully offline (mocked HTTP, stubbed hardware)

The test suite requires no running runtime and no downloaded model: every HTTP call is mocked and hardware probing is stubbed.

License

CC-BY-NC-ND-4.0 -- see LICENSE.

Built by TMHSDigital

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

5dRelease cycle

7Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Appeared in Searches

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/TMHSDigital/local-ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server