Schema | llama-mcp-server

llama-mcp-server

Describes the environment variables required to run the server.

Name	Required	Description	Default
`LLAMA_MODEL_PATH`	No	Path to GGUF model file
`LLAMA_SERVER_URL`	No	URL of llama-server	http://localhost:8080
`LLAMA_SERVER_PATH`	No	Path to llama-server binary	llama-server
`LLAMA_SERVER_TIMEOUT`	No	Request timeout in ms	30000

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }

Functions exposed to the LLM to take actions

Name	Description
llama_healthB	Check if llama-server is running and get status
llama_propsB	Get or set server properties and default generation settings
llama_modelsA	List available/loaded models
llama_slotsA	View current slot processing state
llama_metricsA	Get Prometheus-compatible metrics (tokens processed, latency, etc.)
llama_tokenizeC	Convert text to token IDs
llama_detokenizeB	Convert token IDs back to text
llama_apply_templateA	Format chat messages using model's template without inference
llama_completeB	Generate text completion from a prompt
llama_chatC	Chat completion (OpenAI-compatible format)
llama_embedC	Generate embeddings for text
llama_infillC	Code completion with prefix and suffix context (fill-in-middle)
llama_rerankB	Rerank documents by relevance to a query
llama_load_modelC	Load a model (router mode only)
llama_unload_modelB	Unload the current model (router mode only)
llama_lora_listA	List loaded LoRA adapters
llama_lora_setC	Set LoRA adapter scales
llama_startB	Start llama-server as a child process with the specified model
llama_stopA	Stop the running llama-server process

Interactive templates invoked by user choice

Name	Description
No prompts

Contextual data attached and managed by the client

Name	Description
No resources

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ahays248/llama-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server