Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
VLLM_MODELNoDefault model to use
VLLM_API_KEYNoAPI key (if required)
VLLM_BASE_URLNovLLM server URLhttp://localhost:8000
VLLM_HF_TOKENNoHuggingFace token for gated models (e.g., Llama)
VLLM_DOCKER_IMAGENoContainer image (GPU mode)vllm/vllm-openai:latest
VLLM_CONTAINER_NAMENoContainer namevllm-server
VLLM_DEFAULT_TIMEOUTNoRequest timeout (seconds)60.0
VLLM_DOCKER_IMAGE_CPUNoContainer image (CPU mode)quay.io/rh_ee_micyang/vllm-cpu:v0.11.0
VLLM_CONTAINER_RUNTIMENoContainer runtime (podman, docker, or auto)
VLLM_DEFAULT_MAX_TOKENSNoDefault max tokens1024
VLLM_DOCKER_IMAGE_MACOSNoContainer image (macOS)quay.io/rh_ee_micyang/vllm-mac:v0.11.0
VLLM_DEFAULT_TEMPERATURENoDefault temperature0.7
VLLM_GPU_MEMORY_UTILIZATIONNoGPU memory fraction0.9

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
vllm_chatC

Send a chat message to the vLLM server. Supports multi-turn conversations.

vllm_completeB

Generate text completion using vLLM. Good for code completion and text generation.

list_modelsB

List all available models on the vLLM server

get_model_infoC

Get detailed information about a specific model

vllm_statusB

Check the health and status of the vLLM server

start_vllmA

Start a vLLM server in a Docker container. Automatically detects platform (Linux/macOS/Windows) and GPU availability.

stop_vllmB

Stop a running vLLM Docker container

restart_vllmC

Restart a vLLM Docker container

list_vllm_containersC

List all vLLM Docker containers

get_vllm_logsB

Get logs from a vLLM container to check loading progress or errors

get_platform_statusB

Get platform information including Docker and GPU availability

run_benchmarkB

Run a performance benchmark against the vLLM server using GuideLLM

Prompts

Interactive templates invoked by user choice

NameDescription
coding_assistantA helpful coding assistant that writes clean, efficient code
code_reviewerReviews code for bugs, security issues, and improvements
technical_writerCreates clear technical documentation
debuggerHelps identify and fix bugs in code
architectDesigns software systems and architectures
data_analystAnalyzes data and creates insights
ml_engineerDevelops machine learning models and pipelines

Resources

Contextual data attached and managed by the client

NameDescription
vLLM Server StatusCurrent status and health of the vLLM server
vLLM Performance MetricsPerformance metrics from the vLLM server
vLLM MCP ConfigurationCurrent configuration settings
Platform InformationPlatform, Docker, and GPU status information

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/micytao/vllm-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server