llama-mcp-server
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| LLAMA_MODEL_PATH | No | Path to GGUF model file | |
| LLAMA_SERVER_URL | No | URL of llama-server | http://localhost:8080 |
| LLAMA_SERVER_PATH | No | Path to llama-server binary | llama-server |
| LLAMA_SERVER_TIMEOUT | No | Request timeout in ms | 30000 |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| llama_healthB | Check if llama-server is running and get status |
| llama_propsB | Get or set server properties and default generation settings |
| llama_modelsA | List available/loaded models |
| llama_slotsA | View current slot processing state |
| llama_metricsA | Get Prometheus-compatible metrics (tokens processed, latency, etc.) |
| llama_tokenizeC | Convert text to token IDs |
| llama_detokenizeB | Convert token IDs back to text |
| llama_apply_templateA | Format chat messages using model's template without inference |
| llama_completeB | Generate text completion from a prompt |
| llama_chatC | Chat completion (OpenAI-compatible format) |
| llama_embedC | Generate embeddings for text |
| llama_infillC | Code completion with prefix and suffix context (fill-in-middle) |
| llama_rerankB | Rerank documents by relevance to a query |
| llama_load_modelC | Load a model (router mode only) |
| llama_unload_modelB | Unload the current model (router mode only) |
| llama_lora_listA | List loaded LoRA adapters |
| llama_lora_setC | Set LoRA adapter scales |
| llama_startB | Start llama-server as a child process with the specified model |
| llama_stopA | Stop the running llama-server process |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ahays248/llama-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server