Skip to main content
Glama

vllm_status

Monitor vLLM server health and operational status to verify system functionality and detect issues.

Instructions

Check the health and status of the vLLM server

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • Main handler that formats and returns the vLLM server status as text
    async def get_server_status_text() -> str:
        """
        Get formatted server status as text.
    
        Returns:
            Formatted string with server status.
        """
        status = await get_server_status()
    
        # Status emoji
        status_emoji = {
            "healthy": "✅",
            "unhealthy": "⚠️",
            "offline": "❌",
            "unknown": "❓",
        }
    
        emoji = status_emoji.get(status["status"], "❓")
        
        lines = [
            f"## vLLM Server Status {emoji}",
            "",
            f"**Status:** {status['status']}",
            f"**Base URL:** {status['base_url']}",
        ]
    
        if status["models"]:
            lines.append(f"**Models:** {', '.join(status['models'])}")
        
        if status.get("error"):
            lines.append(f"**Error:** {status['error']}")
    
        if status.get("models_error"):
            lines.append(f"**Models Error:** {status['models_error']}")
    
        return "\n".join(lines)
  • Core logic that checks the vLLM server health and retrieves available models
    async def get_server_status() -> dict[str, Any]:
        """
        Get the current status of the vLLM server.
    
        Returns:
            Dictionary with server status information:
            - status: "healthy", "unhealthy", or "offline"
            - base_url: The configured vLLM base URL
            - models: List of available models (if healthy)
            - error: Error message (if any)
        """
        result: dict[str, Any] = {
            "status": "unknown",
            "base_url": "",
            "models": [],
            "error": None,
        }
    
        try:
            async with VLLMClient() as client:
                result["base_url"] = client.settings.base_url
    
                # Check health
                health = await client.health_check()
                if health.get("status") == "healthy":
                    result["status"] = "healthy"
    
                    # Get available models
                    try:
                        models = await client.list_models()
                        result["models"] = [m.get("id", "unknown") for m in models]
                    except VLLMClientError as e:
                        result["models_error"] = str(e)
                else:
                    result["status"] = "unhealthy"
                    result["error"] = f"Server returned status code: {health.get('code')}"
    
        except VLLMClientError as e:
            result["status"] = "offline"
            result["error"] = str(e)
    
        return result
  • Tool registration defining the vllm_status tool with its schema
    Tool(
        name="vllm_status",
        description="Check the health and status of the vLLM server",
        inputSchema={
            "type": "object",
            "properties": {},
        },
    ),
  • Tool handler routing in call_tool function that calls get_server_status_text()
    elif name == "vllm_status":
        status_text = await get_server_status_text()
        return [TextContent(type="text", text=status_text)]
  • Helper method that performs the actual health check HTTP request to the vLLM server
    async def health_check(self) -> dict[str, Any]:
        """Check if the vLLM server is healthy."""
        session = await self._get_session()
        try:
            async with session.get(
                f"{self.settings.base_url}/health",
                headers=self.headers,
            ) as response:
                if response.status == 200:
                    return {"status": "healthy", "code": 200}
                return {"status": "unhealthy", "code": response.status}
        except aiohttp.ClientConnectorError as e:
            raise VLLMConnectionError(f"Cannot connect to vLLM server: {e}") from e
        except asyncio.TimeoutError as e:
            raise VLLMConnectionError("Connection to vLLM server timed out") from e

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/micytao/vllm-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server