Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
VLLM_MODELNoDefault model to use
VLLM_API_KEYNoAPI key (if required)
VLLM_BASE_URLNovLLM server URLhttp://localhost:8000
VLLM_HF_TOKENNoHuggingFace token for gated models (e.g., Llama)
VLLM_DOCKER_IMAGENoContainer image (GPU mode)vllm/vllm-openai:latest
VLLM_CONTAINER_NAMENoContainer namevllm-server
VLLM_DEFAULT_TIMEOUTNoRequest timeout (seconds)60.0
VLLM_DOCKER_IMAGE_CPUNoContainer image (CPU mode)quay.io/rh_ee_micyang/vllm-cpu:v0.11.0
VLLM_CONTAINER_RUNTIMENoContainer runtime (podman, docker, or auto)
VLLM_DEFAULT_MAX_TOKENSNoDefault max tokens1024
VLLM_DOCKER_IMAGE_MACOSNoContainer image (macOS)quay.io/rh_ee_micyang/vllm-mac:v0.11.0
VLLM_DEFAULT_TEMPERATURENoDefault temperature0.7
VLLM_GPU_MEMORY_UTILIZATIONNoGPU memory fraction0.9

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
vllm_chat

Send a chat message to the vLLM server. Supports multi-turn conversations.

vllm_complete

Generate text completion using vLLM. Good for code completion and text generation.

list_models

List all available models on the vLLM server

get_model_info

Get detailed information about a specific model

vllm_status

Check the health and status of the vLLM server

start_vllm

Start a vLLM server in a Docker container. Automatically detects platform (Linux/macOS/Windows) and GPU availability.

stop_vllm

Stop a running vLLM Docker container

restart_vllm

Restart a vLLM Docker container

list_vllm_containers

List all vLLM Docker containers

get_vllm_logs

Get logs from a vLLM container to check loading progress or errors

get_platform_status

Get platform information including Docker and GPU availability

run_benchmark

Run a performance benchmark against the vLLM server using GuideLLM

Prompts

Interactive templates invoked by user choice

NameDescription
coding_assistantA helpful coding assistant that writes clean, efficient code
code_reviewerReviews code for bugs, security issues, and improvements
technical_writerCreates clear technical documentation
debuggerHelps identify and fix bugs in code
architectDesigns software systems and architectures
data_analystAnalyzes data and creates insights
ml_engineerDevelops machine learning models and pipelines

Resources

Contextual data attached and managed by the client

NameDescription
vLLM Server StatusCurrent status and health of the vLLM server
vLLM Performance MetricsPerformance metrics from the vLLM server
vLLM MCP ConfigurationCurrent configuration settings
Platform InformationPlatform, Docker, and GPU status information

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/micytao/vllm-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server