Schema | vLLM MCP Server

vLLM MCP Server

Describes the environment variables required to run the server.

Name	Required	Description	Default
`VLLM_MODEL`	No	Default model to use
`VLLM_API_KEY`	No	API key (if required)
`VLLM_BASE_URL`	No	vLLM server URL	http://localhost:8000
`VLLM_HF_TOKEN`	No	HuggingFace token for gated models (e.g., Llama)
`VLLM_DOCKER_IMAGE`	No	Container image (GPU mode)	vllm/vllm-openai:latest
`VLLM_CONTAINER_NAME`	No	Container name	vllm-server
`VLLM_DEFAULT_TIMEOUT`	No	Request timeout (seconds)	60.0
`VLLM_DOCKER_IMAGE_CPU`	No	Container image (CPU mode)	quay.io/rh_ee_micyang/vllm-cpu:v0.11.0
`VLLM_CONTAINER_RUNTIME`	No	Container runtime (podman, docker, or auto)
`VLLM_DEFAULT_MAX_TOKENS`	No	Default max tokens	1024
`VLLM_DOCKER_IMAGE_MACOS`	No	Container image (macOS)	quay.io/rh_ee_micyang/vllm-mac:v0.11.0
`VLLM_DEFAULT_TEMPERATURE`	No	Default temperature	0.7
`VLLM_GPU_MEMORY_UTILIZATION`	No	GPU memory fraction	0.9

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": false }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{}

Functions exposed to the LLM to take actions

Name	Description
vllm_chatC	Send a chat message to the vLLM server. Supports multi-turn conversations.
vllm_completeB	Generate text completion using vLLM. Good for code completion and text generation.
list_modelsB	List all available models on the vLLM server
get_model_infoC	Get detailed information about a specific model
vllm_statusB	Check the health and status of the vLLM server
start_vllmA	Start a vLLM server in a Docker container. Automatically detects platform (Linux/macOS/Windows) and GPU availability.
stop_vllmB	Stop a running vLLM Docker container
restart_vllmC	Restart a vLLM Docker container
list_vllm_containersC	List all vLLM Docker containers
get_vllm_logsB	Get logs from a vLLM container to check loading progress or errors
get_platform_statusB	Get platform information including Docker and GPU availability
run_benchmarkB	Run a performance benchmark against the vLLM server using GuideLLM

Interactive templates invoked by user choice

Name	Description
`coding_assistant`	A helpful coding assistant that writes clean, efficient code
`code_reviewer`	Reviews code for bugs, security issues, and improvements
`technical_writer`	Creates clear technical documentation
`debugger`	Helps identify and fix bugs in code
`architect`	Designs software systems and architectures
`data_analyst`	Analyzes data and creates insights
`ml_engineer`	Develops machine learning models and pipelines

Contextual data attached and managed by the client

Name	Description
`vLLM Server Status`	Current status and health of the vLLM server
`vLLM Performance Metrics`	Performance metrics from the vLLM server
`vLLM MCP Configuration`	Current configuration settings
`Platform Information`	Platform, Docker, and GPU status information

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/micytao/vllm-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server