launch_llm_server

Launches an mlx_lm server as a background subprocess with specified model and port, ensuring sufficient free memory before starting.

Instructions

mlx_lm.server をサブプロセスとしてバックグラウンドで起動します。空きメモリが少ない場合は起動が拒否されます。

Input Schema

TableJSON Schema

Name	Required	Description
`model_name`	Yes	起動するモデル名 (例: mlx-community/Llama-3-8B-Instruct-4bit)
`port`	Yes	サーバーを起動するポート番号
`memory_requirement_gb`	No	起動に必要な空きメモリの目安(GB)。未指定時はデフォルトで 4.0GB。

Implementation Reference

src/mcp_mlx_launcher/server.py:79-97 (registration)

Tool registration for 'launch_llm_server' in handle_list_tools(). Defines the tool name, description, and inputSchema with model_name (string, required), port (integer, required), and memory_requirement_gb (number, optional, default 4.0).

types.Tool(
    name="launch_llm_server",
    description="mlx_lm.server をサブプロセスとしてバックグラウンドで起動します。空きメモリが少ない場合は起動が拒否されます。",
    inputSchema={
        "type": "object",
        "properties": {
            "model_name": {
                "type": "string",
                "description": "起動するモデル名 (例: mlx-community/Llama-3-8B-Instruct-4bit)",
            },
            "port": {"type": "integer", "description": "サーバーを起動するポート番号"},
            "memory_requirement_gb": {
                "type": "number",
                "description": "起動に必要な空きメモリの目安(GB)。未指定時はデフォルトで 4.0GB。"
            }
        },
        "required": ["model_name", "port"],
    },
),

src/mcp_mlx_launcher/server.py:200-217 (handler)

Handler for 'launch_llm_server' in handle_call_tool(). Extracts arguments (model_name, port, memory_requirement_gb), validates types, and calls process_manager.launch_server() with timeout=10.

elif name == "launch_llm_server":
    model_name = arguments.get("model_name")
    port = arguments.get("port")
    memory_requirement_gb = arguments.get("memory_requirement_gb", 4.0)
    
    if not isinstance(model_name, str) or not isinstance(port, int):
        raise ValueError("Invalid arguments for launch_llm_server")
    if not isinstance(memory_requirement_gb, (int, float)):
        raise ValueError("memory_requirement_gb must be a number")
        
    result_msg = await asyncio.to_thread(
        process_manager.launch_server, 
        model_name, 
        port, 
        10, 
        float(memory_requirement_gb)
    )
    return [types.TextContent(type="text", text=result_msg)]

src/mcp_mlx_launcher/process_manager.py:110-170 (helper)

Core implementation of launch_server() in MlxProcessManager. Checks port availability, verifies sufficient memory, spawns mlx_lm.server as a subprocess, health-checks port availability within timeout, and saves state. Returns success/error message.

def launch_server(self, model_name: str, port: int, timeout: int = 10, memory_requirement_gb: float = 4.0) -> str:
    """mlx_lm.server を起動し、生存とポートの開放を確認する"""
    
    if self.is_port_in_use(port):
        return f"Error: Port {port} is already in use."

    mem = psutil.virtual_memory()
    available_gb = mem.available / (1024 ** 3)
    if available_gb < memory_requirement_gb:
        return f"Error: Insufficient memory. Only {available_gb:.2f}GB available, but at least {memory_requirement_gb}GB is requested to launch this model safely."

    is_cached = self.is_model_cached(model_name)

    cmd = [
        sys.executable,
        "-m",
        "mlx_lm.server",
        "--model",
        model_name,
        "--port",
        str(port),
    ]

    try:
        process = subprocess.Popen(
            cmd,
            stdout=subprocess.DEVNULL,
            stderr=subprocess.DEVNULL,
            start_new_session=True,
        )

        start_time = time.time()
        is_verified = False
        
        while time.time() - start_time < timeout:
            poll_result = process.poll()
            if poll_result is not None:
                return f"Error: Process exited immediately with code {poll_result}. Check if the model name is correct or if you have enough unified memory."

            if self.is_port_in_use(port):
                is_verified = True
                break
            
            time.sleep(0.5)

        if not is_verified:
            if not is_cached:
                msg_suffix = " (Note: Model is currently being downloaded from Hugging Face in the background. It may take a while before the port becomes active.)"
            else:
                msg_suffix = " (Warning: Port not yet listening, model might still be loading into memory)"
        else:
            msg_suffix = ""

        state = self._load_state()
        state[str(port)] = {"pid": process.pid, "model": model_name}
        self._save_state(state)

        return f"Successfully launched '{model_name}' on port {port} (PID: {process.pid}){msg_suffix}."

    except Exception as e:
        return f"Error launching process: {str(e)}"

mcp-mlx-launcher

launch_llm_server

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API