Which integrations are available for this server?

Allows managing local Ollama models: discover installed and loaded models, pull/remove models, load/unload from memory, check hardware fit, and offload inference (completion and embedding) to local Ollama instances.

How do I use Local AI MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Local AI MCP list installed models" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Local AI MCP

Official

by TMHSDigital

Overview Schema Related Servers Score Discussions

TypeScript

Local

Local AI MCP

Unified MCP server for managing local model runtimes (Ollama, LM Studio, and more): provider-agnostic discovery, lifecycle, hardware-fit, and delegated inference.

License: CC-BY-NC-ND-4.0 Version Type

Local AI MCP is an MCP server that turns your local model runtimes into an agent-callable control plane. It is operations-first: its primary job is to discover, inspect, fit, and manage the models running on your own machine. It speaks to runtimes over their local HTTP APIs and exposes one consistent tool surface across them, so an agent does not need to know whether a model lives in Ollama or LM Studio.

The server communicates over stdio only. It is a client to your local runtimes and never opens a network listener of its own.

Why an ops-first local-model server

Discovery and lifecycle, not just chat. List what is installed, what is loaded, pull and remove models, load and unload them, and check their fit against your hardware before you commit VRAM to them.
Hardware-aware. system_resources and fit_check read your real RAM and GPU/VRAM so an agent can pick a model that will actually run, and suggest_model ranks candidates by task and by what fits.
Provider-agnostic. Every tool takes an optional provider argument. Omit it and the tool operates across all detected runtimes, aggregating results per provider.

Related MCP server: 1mcpserver

Inference is delegation, not chat

The complete and embed tools exist to delegate (offload) inference to a local model for cost control and privacy: keep tokens and data on your own hardware instead of sending them to a hosted API. They are deliberately framed as delegated/offloaded inference primitives, not as a conversational chat surface.

The provider-adapter model

Each runtime is implemented as an adapter behind a single Provider interface (src/providers/types.ts) with a uniform method set: detect, health, listModels, listLoaded, modelInfo, pull, remove, load, unload, complete, embed, and capabilities. Adding a runtime means adding one adapter; the tool layer is unchanged.

Adapter	Default host	Transport	Notes
Ollama (`src/providers/ollama.ts`)	`http://localhost:11434`	Native REST + OpenAI-compatible	`load`/`unload` map to Ollama `keep_alive` semantics (`keep_alive` to load, `keep_alive: 0` to unload). `complete`/`embed` use the OpenAI-compatible `/v1` routes.
LM Studio (`src/providers/lmstudio.ts`)	`http://localhost:1234`	REST (`/api/v0`) + OpenAI-compatible	Uses the `lms` CLI for `load`/`unload`/`pull`/`remove` when present; falls back to REST for `listModels`/`listLoaded`/`complete`/`embed`.

Auto-detection: on each call the server probes the configured local endpoints to determine which runtimes are live. Hardware probing is isolated in src/hardware/ and branches by platform (Windows / Linux); it exposes total/free RAM and, where detectable, GPU name and VRAM.

Tool surface (16 tools)

Discovery

Tool	Description
`list_providers`	Configured runtimes, their host, live/detected status, and capabilities.
`list_models`	Installed models across detected providers (or one provider).
`list_loaded`	Models currently resident in memory.
`model_info`	Detailed metadata for a model.

Lifecycle

Tool	Description
`pull_model`	Download a model. Heavy: may transfer multiple GB.
`remove_model`	Delete a model from disk. Destructive: requires `confirm: true` and a `provider` (no fan-out); refuses without `confirm: true`.
`load_model`	Load a model into memory (Ollama `keep_alive`; LM Studio `lms load`).
`unload_model`	Evict a model from memory.

Ops

Tool	Description
`health_check`	Liveness and version per provider.
`system_resources`	Total/free RAM, CPU count, and GPU/VRAM.
`fit_check`	Whether a model fits in free VRAM (GPU) or RAM (CPU), with the numbers.
`benchmark`	Measure latency and tokens/sec with one small completion. Heavy: runs real inference.

Registry

Tool	Description
`search_available`	Search a curated catalog of well-known models (Ollama library oriented).
`suggest_model`	Recommend a model for a task, ranked by what fits your detected hardware.

Delegation (offloaded inference)

Tool	Description
`complete`	Delegate a completion to a local model (cost/privacy offload, not chat).
`embed`	Delegate embedding generation to a local model.

Every tool except system_resources accepts an optional provider (ollama | lmstudio). Omit it to operate across all detected runtimes.

Install and run

npx @tmhs/local-ai-mcp

Claude Desktop / Cursor config

{
  "mcpServers": {
    "local-ai": {
      "command": "npx",
      "args": ["-y", "@tmhs/local-ai-mcp"],
      "env": {
        "OLLAMA_HOST": "http://localhost:11434",
        "LMSTUDIO_HOST": "http://localhost:1234"
      }
    }
  }
}

Configuration

All configuration is via environment variables with sane defaults:

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama base URL (scheme optional; added if missing).
`LMSTUDIO_HOST`	`http://localhost:1234`	LM Studio base URL.
`LOCAL_AI_REQUEST_TIMEOUT_MS`	`120000`	Timeout for normal requests (inference, pull progress, etc.).
`LOCAL_AI_DETECT_TIMEOUT_MS`	`1500`	Timeout for provider auto-detection probes.
`LOCAL_AI_PULL_TIMEOUT_MS`	`3600000`	Timeout for model pulls (multi-GB downloads); set `0` to disable.

Development

npm install
npm run build      # tsc -> dist/
npm test           # vitest; runs fully offline (mocked HTTP, stubbed hardware)

The test suite requires no running runtime and no downloaded model: every HTTP call is mocked and hardware probing is stubbed.

License

CC-BY-NC-ND-4.0 -- see LICENSE.

Built by TMHSDigital

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

0dRelease cycle

3Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Appeared in Searches

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/TMHSDigital/local-ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server