llama-mcp-server
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@llama-mcp-serverwrite a poem about the ocean in 4 lines"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
llama-mcp-server
MCP server bridging Claude Code to local llama.cpp. Run local LLMs alongside Claude for experimentation, testing, and cost-effective inference.
Requirements
Node.js 18+
llama.cpp with
llama-serverbuiltA GGUF model file
Related MCP server: agent-skill-loader
Installation
npm install llama-mcp-serverOr clone and build from source:
git clone https://github.com/ahays248/llama-mcp-server
cd llama-mcp-server
npm install
npm run buildConfiguration
Configure via environment variables:
Variable | Description | Default |
| URL of llama-server |
|
| Request timeout in ms |
|
| Path to GGUF model file | (none) |
| Path to llama-server binary |
|
Usage with Claude Code
Option 1: Plugin Installation (Recommended)
Due to a known bug in Claude Code, non-plugin MCP servers may connect but not expose their tools. The workaround is to install llama-mcp-server as a plugin via a local marketplace.
Step 1: Create the marketplace structure
llama-marketplace/
├── .claude-plugin/
│ └── marketplace.json
└── plugins/
└── llama/
├── .claude-plugin/
│ └── plugin.json
└── .mcp.jsonStep 2: Create marketplace.json
// llama-marketplace/.claude-plugin/marketplace.json
{
"name": "llama-marketplace",
"description": "Local marketplace for llama.cpp MCP plugin",
"owner": {
"name": "Your Name"
},
"plugins": [
{
"name": "llama",
"description": "llama.cpp MCP server for local LLM inference",
"source": "./plugins/llama"
}
]
}Step 3: Create plugin.json
// llama-marketplace/plugins/llama/.claude-plugin/plugin.json
{
"name": "llama",
"version": "0.1.0",
"description": "llama.cpp MCP server for local LLM inference"
}Step 4: Create .mcp.json
// llama-marketplace/plugins/llama/.mcp.json
{
"mcpServers": {
"llama": {
"command": "npx",
"args": ["-y", "llama-mcp-server"],
"env": {
"LLAMA_SERVER_URL": "http://localhost:8080",
"LLAMA_MODEL_PATH": "/path/to/your/model.gguf",
"LLAMA_SERVER_PATH": "/path/to/llama-server"
}
}
}
}Step 5: Install the plugin
# Add the local marketplace
claude plugin marketplace add /path/to/llama-marketplace
# Install the plugin
claude plugin install llama@llama-marketplace
# Restart Claude CodeAfter restart, tools will appear as mcp__plugin_llama_llama__*.
Option 2: Direct MCP Configuration
Note: This method may not work due to the bug mentioned above. If tools don't appear after adding the server, use Option 1.
Add to your Claude Code MCP configuration:
claude mcp add llama -e LLAMA_SERVER_URL=http://localhost:8080 -e LLAMA_MODEL_PATH=/path/to/model.gguf -e LLAMA_SERVER_PATH=/path/to/llama-server -- npx -y llama-mcp-serverOr add manually to ~/.claude.json:
{
"mcpServers": {
"llama": {
"command": "npx",
"args": ["-y", "llama-mcp-server"],
"env": {
"LLAMA_SERVER_URL": "http://localhost:8080",
"LLAMA_MODEL_PATH": "/path/to/your/model.gguf",
"LLAMA_SERVER_PATH": "/path/to/llama-server"
}
}
}
}Tools
Server Tools
Tool | Description |
| Check if llama-server is running and get status |
| Get or set server properties |
| List available/loaded models |
| View current slot processing state |
| Get Prometheus-compatible metrics |
Token Tools
Tool | Description |
| Convert text to token IDs |
| Convert token IDs back to text |
| Format chat messages using model's template |
Inference Tools
Tool | Description |
| Generate text completion from a prompt |
| Chat completion (OpenAI-compatible) |
| Generate embeddings for text |
| Code completion with prefix and suffix context |
| Rerank documents by relevance to a query |
Model Management Tools
Tool | Description |
| Load a model (router mode only) |
| Unload the current model (router mode only) |
LoRA Tools
Tool | Description |
| List loaded LoRA adapters |
| Set LoRA adapter scales |
Process Control Tools
Tool | Description |
| Start llama-server as a child process |
| Stop the llama-server process |
Example: Starting llama-server and Running Inference
User: Start llama-server with my local model
Claude: I'll start llama-server for you.
[Uses llama_start tool with model path]
User: Generate a haiku about coding
Claude: Let me use the local model for that.
[Uses llama_complete tool]
Result:
Lines of code cascade
Through the silent morning hours
Bugs flee from the lightDevelopment
# Run tests
npm test
# Type check
npm run typecheck
# Build
npm run build
# Watch mode for development
npm run devTroubleshooting
Tools don't appear in Claude Code
Symptom: Server shows "Connected" in claude mcp list but no llama_* tools are available.
Cause: Known bug in Claude Code where non-plugin MCP servers don't expose tools (#12164).
Solution: Use the plugin installation method (Option 1 above).
HTTP 501 errors for certain tools
Some tools require specific server configurations:
Tool | Requirement |
| Start llama-server with |
| Start llama-server with |
| Use a model with fill-in-middle support (e.g., CodeLlama, DeepSeek Coder) |
| Use a reranker model |
| llama-server must be in router mode |
Connection refused errors
Symptom: Cannot connect to llama-server at http://localhost:8080
Solutions:
Use
llama_startto start the server, orStart llama-server manually:
llama-server -m /path/to/model.ggufCheck that
LLAMA_SERVER_URLmatches where llama-server is running
WSL/Windows path issues
When running in WSL, ensure paths use Linux format:
✓
/home/user/models/model.gguf✗
C:\Users\user\models\model.gguf
License
MIT
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ahays248/llama-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server