drag
Move windows or UI elements by simulating mouse drag gestures between specified screen coordinates with controlled timing for macOS automation.
Instructions
Drag from one screen coordinate to another over a specified duration.
Best practices for window dragging:
Always call focus_window on the target app immediately BEFORE dragging to ensure the window is frontmost. Without this, the drag may land on a different overlapping window and silently fail.
Start coordinates must land on the window's title bar — use the far-right edge of the title bar to avoid traffic-light buttons and any center toolbar icons.
Use a duration of 600–1000 ms. Too short and macOS may not recognize it as a drag gesture.
Do not narrate visual observations or coordinate calculations. Brief task progress updates are acceptable.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| start_x | Yes | Start X coordinate in screen pixels (may be negative for secondary displays) | |
| start_y | Yes | Start Y coordinate in screen pixels (may be negative for secondary displays) | |
| end_x | Yes | End X coordinate in screen pixels (may be negative for secondary displays) | |
| end_y | Yes | End Y coordinate in screen pixels (may be negative for secondary displays) | |
| duration_ms | Yes | Drag duration in milliseconds (default: 600) |
Implementation Reference
- src/tools/mouse.ts:296-323 (handler)The handleDrag function performs input validation using DragInputSchema, executes the drag operation using runInputHelper, and formats the response.
async function handleDrag( args: Record<string, unknown>, ): Promise<CallToolResult> { const parsed = DragInputSchema.parse(args); const result = await runInputHelper("drag", { sx: parsed.start_x, sy: parsed.start_y, ex: parsed.end_x, ey: parsed.end_y, duration: parsed.duration_ms, }); const response: Record<string, unknown> = { dragged: { from: { x: parsed.start_x, y: parsed.start_y }, to: { x: parsed.end_x, y: parsed.end_y }, }, duration_ms: parsed.duration_ms, }; if (typeof result.warning === "string") { response.warning = result.warning; } return { content: [{ type: "text" as const, text: JSON.stringify(response) }], }; } - src/tools/mouse.ts:106-140 (schema)The Zod schema defining the input requirements for the drag tool.
const DragInputSchema = z.object({ start_x: z .number() .int() .describe( "Start X coordinate in screen pixels (may be negative for secondary displays)", ), start_y: z .number() .int() .describe( "Start Y coordinate in screen pixels (may be negative for secondary displays)", ), end_x: z .number() .int() .describe( "End X coordinate in screen pixels (may be negative for secondary displays)", ), end_y: z .number() .int() .describe( "End Y coordinate in screen pixels (may be negative for secondary displays)", ), duration_ms: z .number() .int() .positive() .max(DRAG_MAX_DURATION_MS) .default(DRAG_DEFAULT_DURATION_MS) .describe( `Drag duration in milliseconds (default: ${DRAG_DEFAULT_DURATION_MS})`, ), }); - src/tools/mouse.ts:173-189 (registration)The tool definition registered in mouseToolDefinitions.
name: "drag", description: [ "Drag from one screen coordinate to another over a specified duration.", "", "Best practices for window dragging:", "- Always call focus_window on the target app immediately BEFORE dragging to ensure the window is frontmost. Without this, the drag may land on a different overlapping window and silently fail.", "- Start coordinates must land on the window's title bar — use the far-right edge of the title bar to avoid traffic-light buttons and any center toolbar icons.", "- Use a duration of 600–1000 ms. Too short and macOS may not recognize it as a drag gesture.", "", SILENT_HINT, ].join("\n"), inputSchema: zodToToolInputSchema(DragInputSchema), annotations: { readOnlyHint: false, destructiveHint: true, }, },