launch_llm_server
Launches an mlx_lm server as a background subprocess with specified model and port, ensuring sufficient free memory before starting.
Instructions
mlx_lm.server をサブプロセスとしてバックグラウンドで起動します。空きメモリが少ない場合は起動が拒否されます。
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model_name | Yes | 起動するモデル名 (例: mlx-community/Llama-3-8B-Instruct-4bit) | |
| port | Yes | サーバーを起動するポート番号 | |
| memory_requirement_gb | No | 起動に必要な空きメモリの目安(GB)。未指定時はデフォルトで 4.0GB。 |
Implementation Reference
- src/mcp_mlx_launcher/server.py:79-97 (registration)Tool registration for 'launch_llm_server' in handle_list_tools(). Defines the tool name, description, and inputSchema with model_name (string, required), port (integer, required), and memory_requirement_gb (number, optional, default 4.0).
types.Tool( name="launch_llm_server", description="mlx_lm.server をサブプロセスとしてバックグラウンドで起動します。空きメモリが少ない場合は起動が拒否されます。", inputSchema={ "type": "object", "properties": { "model_name": { "type": "string", "description": "起動するモデル名 (例: mlx-community/Llama-3-8B-Instruct-4bit)", }, "port": {"type": "integer", "description": "サーバーを起動するポート番号"}, "memory_requirement_gb": { "type": "number", "description": "起動に必要な空きメモリの目安(GB)。未指定時はデフォルトで 4.0GB。" } }, "required": ["model_name", "port"], }, ), - src/mcp_mlx_launcher/server.py:200-217 (handler)Handler for 'launch_llm_server' in handle_call_tool(). Extracts arguments (model_name, port, memory_requirement_gb), validates types, and calls process_manager.launch_server() with timeout=10.
elif name == "launch_llm_server": model_name = arguments.get("model_name") port = arguments.get("port") memory_requirement_gb = arguments.get("memory_requirement_gb", 4.0) if not isinstance(model_name, str) or not isinstance(port, int): raise ValueError("Invalid arguments for launch_llm_server") if not isinstance(memory_requirement_gb, (int, float)): raise ValueError("memory_requirement_gb must be a number") result_msg = await asyncio.to_thread( process_manager.launch_server, model_name, port, 10, float(memory_requirement_gb) ) return [types.TextContent(type="text", text=result_msg)] - Core implementation of launch_server() in MlxProcessManager. Checks port availability, verifies sufficient memory, spawns mlx_lm.server as a subprocess, health-checks port availability within timeout, and saves state. Returns success/error message.
def launch_server(self, model_name: str, port: int, timeout: int = 10, memory_requirement_gb: float = 4.0) -> str: """mlx_lm.server を起動し、生存とポートの開放を確認する""" if self.is_port_in_use(port): return f"Error: Port {port} is already in use." mem = psutil.virtual_memory() available_gb = mem.available / (1024 ** 3) if available_gb < memory_requirement_gb: return f"Error: Insufficient memory. Only {available_gb:.2f}GB available, but at least {memory_requirement_gb}GB is requested to launch this model safely." is_cached = self.is_model_cached(model_name) cmd = [ sys.executable, "-m", "mlx_lm.server", "--model", model_name, "--port", str(port), ] try: process = subprocess.Popen( cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, start_new_session=True, ) start_time = time.time() is_verified = False while time.time() - start_time < timeout: poll_result = process.poll() if poll_result is not None: return f"Error: Process exited immediately with code {poll_result}. Check if the model name is correct or if you have enough unified memory." if self.is_port_in_use(port): is_verified = True break time.sleep(0.5) if not is_verified: if not is_cached: msg_suffix = " (Note: Model is currently being downloaded from Hugging Face in the background. It may take a while before the port becomes active.)" else: msg_suffix = " (Warning: Port not yet listening, model might still be loading into memory)" else: msg_suffix = "" state = self._load_state() state[str(port)] = {"pid": process.pid, "model": model_name} self._save_state(state) return f"Successfully launched '{model_name}' on port {port} (PID: {process.pid}){msg_suffix}." except Exception as e: return f"Error launching process: {str(e)}"