Skip to main content
Glama

touch

Simulate precise touch events at specific coordinates in Xcode simulators. Use 'describe_ui' to identify exact positions, ensuring accurate interactions for testing.

Instructions

Perform touch down/up events at specific coordinates. Use describe_ui for precise coordinates (don't guess from screenshots).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
delayNo
downNo
simulatorUuidYes
upNo
xYes
yYes

Implementation Reference

  • The main handler function `touchLogic` that parses parameters, constructs the axe 'touch' command, executes it via `executeAxeCommand`, and handles responses and errors.
    export async function touchLogic(
      params: TouchParams,
      executor: CommandExecutor,
      axeHelpers?: AxeHelpers,
    ): Promise<ToolResponse> {
      const toolName = 'touch';
    
      // Params are already validated by createTypedTool - use directly
      const { simulatorId, x, y, down, up, delay } = params;
    
      // Validate that at least one of down or up is specified
      if (!down && !up) {
        return createErrorResponse('At least one of "down" or "up" must be true');
      }
    
      const commandArgs = ['touch', '-x', String(x), '-y', String(y)];
      if (down) {
        commandArgs.push('--down');
      }
      if (up) {
        commandArgs.push('--up');
      }
      if (delay !== undefined) {
        commandArgs.push('--delay', String(delay));
      }
    
      const actionText = down && up ? 'touch down+up' : down ? 'touch down' : 'touch up';
      log(
        'info',
        `${LOG_PREFIX}/${toolName}: Starting ${actionText} at (${x}, ${y}) on ${simulatorId}`,
      );
    
      try {
        await executeAxeCommand(commandArgs, simulatorId, 'touch', executor, axeHelpers);
        log('info', `${LOG_PREFIX}/${toolName}: Success for ${simulatorId}`);
    
        const warning = getCoordinateWarning(simulatorId);
        const message = `Touch event (${actionText}) at (${x}, ${y}) executed successfully.`;
    
        if (warning) {
          return createTextResponse(`${message}\n\n${warning}`);
        }
    
        return createTextResponse(message);
      } catch (error) {
        log(
          'error',
          `${LOG_PREFIX}/${toolName}: Failed - ${error instanceof Error ? error.message : String(error)}`,
        );
        if (error instanceof DependencyError) {
          return createAxeNotAvailableResponse();
        } else if (error instanceof AxeError) {
          return createErrorResponse(
            `Failed to execute touch event: ${error.message}`,
            error.axeOutput,
          );
        } else if (error instanceof SystemError) {
          return createErrorResponse(
            `System error executing axe: ${error.message}`,
            error.originalError?.stack,
          );
        }
        return createErrorResponse(
          `An unexpected error occurred: ${error instanceof Error ? error.message : String(error)}`,
        );
      }
    }
  • Zod schema defining the input parameters for the touch tool, including simulatorId, coordinates, down/up flags, and optional delay.
    const touchSchema = z.object({
      simulatorId: z.string().uuid('Invalid Simulator UUID format'),
      x: z.number().int('X coordinate must be an integer'),
      y: z.number().int('Y coordinate must be an integer'),
      down: z.boolean().optional(),
      up: z.boolean().optional(),
      delay: z.number().min(0, 'Delay must be non-negative').optional(),
    });
  • Default export that registers the 'touch' tool with MCP, providing name, description, schema, and a session-aware handler wrapping `touchLogic`.
    export default {
      name: 'touch',
      description:
        "Perform touch down/up events at specific coordinates. Use describe_ui for precise coordinates (don't guess from screenshots).",
      schema: publicSchemaObject.shape, // MCP SDK compatibility
      handler: createSessionAwareTool<TouchParams>({
        internalSchema: touchSchema as unknown as z.ZodType<TouchParams>,
        logicFunction: (params: TouchParams, executor: CommandExecutor) => touchLogic(params, executor),
        getExecutor: getDefaultCommandExecutor,
        requirements: [{ allOf: ['simulatorId'], message: 'simulatorId is required' }],
      }),
    };
  • Helper for tracking describe_ui sessions and providing coordinate warnings in touch responses.
    interface DescribeUISession {
      timestamp: number;
      simulatorId: string;
    }
    
    const describeUITimestamps = new Map<string, DescribeUISession>();
    const DESCRIBE_UI_WARNING_TIMEOUT = 60000; // 60 seconds
    
    function getCoordinateWarning(simulatorId: string): string | null {
      const session = describeUITimestamps.get(simulatorId);
      if (!session) {
        return 'Warning: describe_ui has not been called yet. Consider using describe_ui for precise coordinates instead of guessing from screenshots.';
      }
    
      const timeSinceDescribe = Date.now() - session.timestamp;
      if (timeSinceDescribe > DESCRIBE_UI_WARNING_TIMEOUT) {
        const secondsAgo = Math.round(timeSinceDescribe / 1000);
        return `Warning: describe_ui was last called ${secondsAgo} seconds ago. Consider refreshing UI coordinates with describe_ui instead of using potentially stale coordinates.`;
      }
    
      return null;
    }
  • Inlined helper function to execute AXe commands, used by touchLogic to run the 'touch' axe subcommand.
    async function executeAxeCommand(
      commandArgs: string[],
      simulatorId: string,
      commandName: string,
      executor: CommandExecutor = getDefaultCommandExecutor(),
      axeHelpers?: AxeHelpers,
    ): Promise<void> {
      // Use injected helpers or default to imported functions
      const helpers = axeHelpers ?? { getAxePath, getBundledAxeEnvironment };
    
      // Get the appropriate axe binary path
      const axeBinary = helpers.getAxePath();
      if (!axeBinary) {
        throw new DependencyError('AXe binary not found');
      }
    
      // Add --udid parameter to all commands
      const fullArgs = [...commandArgs, '--udid', simulatorId];
    
      // Construct the full command array with the axe binary as the first element
      const fullCommand = [axeBinary, ...fullArgs];
    
      try {
        // Determine environment variables for bundled AXe
        const axeEnv = axeBinary !== 'axe' ? helpers.getBundledAxeEnvironment() : undefined;
    
        const result = await executor(fullCommand, `${LOG_PREFIX}: ${commandName}`, false, axeEnv);
    
        if (!result.success) {
          throw new AxeError(
            `axe command '${commandName}' failed.`,
            commandName,
            result.error ?? result.output,
            simulatorId,
          );
        }
    
        // Check for stderr output in successful commands
        if (result.error) {
          log(
            'warn',
            `${LOG_PREFIX}: Command '${commandName}' produced stderr output but exited successfully. Output: ${result.error}`,
          );
        }
    
        // Function now returns void - the calling code creates its own response
      } catch (error) {
        if (error instanceof Error) {
          if (error instanceof AxeError) {
            throw error;
          }
    
          // Otherwise wrap it in a SystemError
          throw new SystemError(`Failed to execute axe command: ${error.message}`, error);
        }
    
        // For any other type of error
        throw new SystemError(`Failed to execute axe command: ${String(error)}`);
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the need for precise coordinates from 'describe_ui,' which adds useful context about dependencies, but it doesn't disclose other behavioral traits such as whether this is a read-only or destructive operation, error handling, or performance implications. For a tool with 6 parameters and no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two sentences that are front-loaded and efficient. The first sentence states the purpose, and the second provides usage guidance, with zero wasted words, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a UI automation tool with 6 parameters, no annotations, and no output schema, the description is incomplete. It lacks details on behavioral traits, parameter explanations, and expected outcomes, making it inadequate for an agent to fully understand how to invoke the tool correctly in this context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning none of the 6 parameters have descriptions in the schema. The description mentions 'coordinates' (hinting at x and y) and 'touch down/up events' (hinting at down and up), but it doesn't explain parameters like delay, simulatorUuid, or the exact usage of boolean flags. It adds minimal meaning beyond the bare schema, failing to compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs 'touch down/up events at specific coordinates,' which is a specific verb+resource combination. However, it doesn't explicitly distinguish this from sibling tools like 'tap,' 'long_press,' or 'gesture,' which likely have overlapping functionality in the UI automation context, so it doesn't fully differentiate from alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use this tool by advising to 'Use describe_ui for precise coordinates (don't guess from screenshots),' which gives practical guidance. However, it doesn't specify when to use this tool versus alternatives like 'tap' or 'gesture,' or any exclusions, so it lacks explicit alternative naming or when-not-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

  • @getsentry/XcodeBuildMCP
  • @getsentry/XcodeBuildMCP
  • @atom2ueki/mcp-server-ios-simulator
  • @joshuayoes/ios-simulator-mcp
  • @joshuayoes/ios-simulator-mcp
  • @vs4vijay/espresso-mcp

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/getsentry/XcodeBuildMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server