Skip to main content
Glama

drag

Destructive

Move windows or UI elements by simulating mouse drag gestures between specified screen coordinates with controlled timing for macOS automation.

Instructions

Drag from one screen coordinate to another over a specified duration.

Best practices for window dragging:

  • Always call focus_window on the target app immediately BEFORE dragging to ensure the window is frontmost. Without this, the drag may land on a different overlapping window and silently fail.

  • Start coordinates must land on the window's title bar — use the far-right edge of the title bar to avoid traffic-light buttons and any center toolbar icons.

  • Use a duration of 600–1000 ms. Too short and macOS may not recognize it as a drag gesture.

Do not narrate visual observations or coordinate calculations. Brief task progress updates are acceptable.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
start_xYesStart X coordinate in screen pixels (may be negative for secondary displays)
start_yYesStart Y coordinate in screen pixels (may be negative for secondary displays)
end_xYesEnd X coordinate in screen pixels (may be negative for secondary displays)
end_yYesEnd Y coordinate in screen pixels (may be negative for secondary displays)
duration_msYesDrag duration in milliseconds (default: 600)

Implementation Reference

  • The handleDrag function performs input validation using DragInputSchema, executes the drag operation using runInputHelper, and formats the response.
    async function handleDrag(
      args: Record<string, unknown>,
    ): Promise<CallToolResult> {
      const parsed = DragInputSchema.parse(args);
    
      const result = await runInputHelper("drag", {
        sx: parsed.start_x,
        sy: parsed.start_y,
        ex: parsed.end_x,
        ey: parsed.end_y,
        duration: parsed.duration_ms,
      });
    
      const response: Record<string, unknown> = {
        dragged: {
          from: { x: parsed.start_x, y: parsed.start_y },
          to: { x: parsed.end_x, y: parsed.end_y },
        },
        duration_ms: parsed.duration_ms,
      };
      if (typeof result.warning === "string") {
        response.warning = result.warning;
      }
    
      return {
        content: [{ type: "text" as const, text: JSON.stringify(response) }],
      };
    }
  • The Zod schema defining the input requirements for the drag tool.
    const DragInputSchema = z.object({
      start_x: z
        .number()
        .int()
        .describe(
          "Start X coordinate in screen pixels (may be negative for secondary displays)",
        ),
      start_y: z
        .number()
        .int()
        .describe(
          "Start Y coordinate in screen pixels (may be negative for secondary displays)",
        ),
      end_x: z
        .number()
        .int()
        .describe(
          "End X coordinate in screen pixels (may be negative for secondary displays)",
        ),
      end_y: z
        .number()
        .int()
        .describe(
          "End Y coordinate in screen pixels (may be negative for secondary displays)",
        ),
      duration_ms: z
        .number()
        .int()
        .positive()
        .max(DRAG_MAX_DURATION_MS)
        .default(DRAG_DEFAULT_DURATION_MS)
        .describe(
          `Drag duration in milliseconds (default: ${DRAG_DEFAULT_DURATION_MS})`,
        ),
    });
  • The tool definition registered in mouseToolDefinitions.
      name: "drag",
      description: [
        "Drag from one screen coordinate to another over a specified duration.",
        "",
        "Best practices for window dragging:",
        "- Always call focus_window on the target app immediately BEFORE dragging to ensure the window is frontmost. Without this, the drag may land on a different overlapping window and silently fail.",
        "- Start coordinates must land on the window's title bar — use the far-right edge of the title bar to avoid traffic-light buttons and any center toolbar icons.",
        "- Use a duration of 600–1000 ms. Too short and macOS may not recognize it as a drag gesture.",
        "",
        SILENT_HINT,
      ].join("\n"),
      inputSchema: zodToToolInputSchema(DragInputSchema),
      annotations: {
        readOnlyHint: false,
        destructiveHint: true,
      },
    },
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations, such as the risk of silent failure if focus_window isn't called, macOS-specific gesture recognition requirements, and practical constraints like avoiding UI elements. While annotations indicate it's destructive and not read-only, the description elaborates on these traits with real-world implications, though it doesn't cover all possible edge cases like error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by best practices and usage notes. Each sentence adds value, but it could be slightly more concise by integrating some of the detailed tips into bullet points or reducing redundancy in the coordinate guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (destructive, macOS-specific behavior) and lack of output schema, the description is largely complete, covering purpose, usage, risks, and best practices. However, it doesn't explicitly describe the return value or error conditions, which could leave gaps for an AI agent in handling failures or interpreting results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already documents all parameters thoroughly. The description adds minimal parameter semantics beyond the schema, such as recommending duration ranges (600–1000 ms) and coordinate placement tips, but doesn't fundamentally enhance understanding of the parameters' roles or interactions beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('drag from one screen coordinate to another') and resource ('screen coordinate'), distinguishing it from sibling tools like move_mouse or click. It precisely defines the action as a drag gesture over a specified duration, making its function unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when and how to use this tool, including prerequisites ('Always call focus_window on the target app immediately BEFORE dragging'), best practices ('Start coordinates must land on the window's title bar'), and alternatives ('Do not narrate visual observations or coordinate calculations'). It clearly distinguishes usage from other tools by specifying the drag context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/antbotlab/mac-use-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server