Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| VLLM_MODEL | No | Default model to use | |
| VLLM_API_KEY | No | API key (if required) | |
| VLLM_BASE_URL | No | vLLM server URL | http://localhost:8000 |
| VLLM_HF_TOKEN | No | HuggingFace token for gated models (e.g., Llama) | |
| VLLM_DOCKER_IMAGE | No | Container image (GPU mode) | vllm/vllm-openai:latest |
| VLLM_CONTAINER_NAME | No | Container name | vllm-server |
| VLLM_DEFAULT_TIMEOUT | No | Request timeout (seconds) | 60.0 |
| VLLM_DOCKER_IMAGE_CPU | No | Container image (CPU mode) | quay.io/rh_ee_micyang/vllm-cpu:v0.11.0 |
| VLLM_CONTAINER_RUNTIME | No | Container runtime (podman, docker, or auto) | |
| VLLM_DEFAULT_MAX_TOKENS | No | Default max tokens | 1024 |
| VLLM_DOCKER_IMAGE_MACOS | No | Container image (macOS) | quay.io/rh_ee_micyang/vllm-mac:v0.11.0 |
| VLLM_DEFAULT_TEMPERATURE | No | Default temperature | 0.7 |
| VLLM_GPU_MEMORY_UTILIZATION | No | GPU memory fraction | 0.9 |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| vllm_chat | Send a chat message to the vLLM server. Supports multi-turn conversations. |
| vllm_complete | Generate text completion using vLLM. Good for code completion and text generation. |
| list_models | List all available models on the vLLM server |
| get_model_info | Get detailed information about a specific model |
| vllm_status | Check the health and status of the vLLM server |
| start_vllm | Start a vLLM server in a Docker container. Automatically detects platform (Linux/macOS/Windows) and GPU availability. |
| stop_vllm | Stop a running vLLM Docker container |
| restart_vllm | Restart a vLLM Docker container |
| list_vllm_containers | List all vLLM Docker containers |
| get_vllm_logs | Get logs from a vLLM container to check loading progress or errors |
| get_platform_status | Get platform information including Docker and GPU availability |
| run_benchmark | Run a performance benchmark against the vLLM server using GuideLLM |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
| coding_assistant | A helpful coding assistant that writes clean, efficient code |
| code_reviewer | Reviews code for bugs, security issues, and improvements |
| technical_writer | Creates clear technical documentation |
| debugger | Helps identify and fix bugs in code |
| architect | Designs software systems and architectures |
| data_analyst | Analyzes data and creates insights |
| ml_engineer | Develops machine learning models and pipelines |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
| vLLM Server Status | Current status and health of the vLLM server |
| vLLM Performance Metrics | Performance metrics from the vLLM server |
| vLLM MCP Configuration | Current configuration settings |
| Platform Information | Platform, Docker, and GPU status information |