Skip to main content
Glama
lroolle

OpenAI Agents MCP Server

by lroolle

computer_action_agent

Execute computer actions and tasks using an AI agent that operates safely and effectively through the OpenAI Agents MCP Server.

Instructions

Use an AI agent specialized in performing computer actions safely and effectively.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
actionYesThe action or task you want to perform on the computer.

Implementation Reference

  • The handler function for the 'computer_action_agent' MCP tool. It registers the tool, defines the input schema (action str), creates a SimpleAsyncComputer, an Agent with ComputerTool, executes it via Runner, and returns AgentResponse.
    @mcp.tool(
        name="computer_action_agent",
        description="Use an AI agent specialized in performing computer actions safely and effectively.",
    )
    async def computer_action(
        action: str = Field(..., description="The action or task you want to perform on the computer.")
    ) -> AgentResponse:
        """Use a specialized computer action agent powered by OpenAI to perform actions on the computer."""
        try:
            computer = SimpleAsyncComputer()
    
            agent = Agent(
                name="Computer Action Assistant",
                instructions=computer_action_instructions,
                tools=[ComputerTool(computer=computer)],
            )
    
            with trace("Computer action agent execution"):
                result = await Runner.run(agent, action)
    
            return AgentResponse(
                response=result.final_output,
                raw_response={"items": [str(item) for item in result.new_items]},
            )
    
        except Exception as e:
            print(f"Error running computer action agent: {e}")
            return AgentResponse(
                response=f"An error occurred while performing the computer action: {str(e)}",
                raw_response=None,
            )
  • Pydantic schema for the output of agent tools, including computer_action_agent.
    class AgentResponse(BaseModel):
        """Response from an OpenAI agent."""
    
        response: str = Field(..., description="The response from the agent")
        raw_response: Optional[Dict[str, Any]] = Field(
            None, description="The raw response data from the agent, if available"
        )
  • Custom AsyncComputer implementation used by the computer_action_agent's ComputerTool. Simulates browser/desktop actions like click, type, scroll, etc., with placeholder screenshot.
    class SimpleAsyncComputer(AsyncComputer):
        """
        A simple implementation of the AsyncComputer interface that simulates computer actions.
    
        In a real implementation, you would use a browser automation library like Playwright
        or a system automation tool to actually perform these actions on the computer.
        """
    
        def __init__(self):
            """Initialize the SimpleAsyncComputer."""
            self._screen_width = 1024
            self._screen_height = 768
            self._cursor_x = 0
            self._cursor_y = 0
            self._current_page = "https://bing.com"
    
        @property
        def environment(self) -> Literal["browser", "desktop"]:
            """Return the environment type of this computer."""
            return "browser"
    
        @property
        def dimensions(self) -> tuple[int, int]:
            """Return the dimensions of the screen."""
            return (self._screen_width, self._screen_height)
    
        async def screenshot(self) -> str:
            """
            Capture a screenshot and return it as a base64-encoded string.
    
            In a real implementation, this would capture an actual screenshot.
            """
            placeholder_png = base64.b64decode(
                "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII="
            )
            return base64.b64encode(placeholder_png).decode("utf-8")
    
        async def click(
            self, x: int, y: int, button: Literal["left", "middle", "right"] = "left"
        ) -> None:
            """Simulate clicking at the specified coordinates."""
            self._cursor_x = x
            self._cursor_y = y
            print(f"Simulated {button} click at ({x}, {y})")
    
        async def double_click(self, x: int, y: int) -> None:
            """Simulate double-clicking at the specified coordinates."""
            self._cursor_x = x
            self._cursor_y = y
            print(f"Simulated double click at ({x}, {y})")
    
        async def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int) -> None:
            """Simulate scrolling from the specified position."""
            self._cursor_x = x
            self._cursor_y = y
            print(f"Simulated scroll at ({x}, {y}) by ({scroll_x}, {scroll_y})")
    
        async def type(self, text: str) -> None:
            """Simulate typing the specified text."""
            print(f"Simulated typing: {text}")
    
        async def wait(self) -> None:
            """Simulate waiting for a short period."""
            await asyncio.sleep(1)
            print("Waited for 1 second")
    
        async def move(self, x: int, y: int) -> None:
            """Simulate moving the cursor to the specified coordinates."""
            self._cursor_x = x
            self._cursor_y = y
            print(f"Moved cursor to ({x}, {y})")
    
        async def keypress(self, keys: list[str]) -> None:
            """Simulate pressing the specified keys."""
            print(f"Simulated keypress: {', '.join(keys)}")
    
        async def drag(self, path: list[tuple[int, int]]) -> None:
            """Simulate dragging the cursor along the specified path."""
            if not path:
                return
    
            self._cursor_x = path[0][0]
            self._cursor_y = path[0][1]
            print(f"Started drag at ({self._cursor_x}, {self._cursor_y})")
    
            for x, y in path[1:]:
                self._cursor_x = x
                self._cursor_y = y
    
            print(f"Ended drag at ({self._cursor_x}, {self._cursor_y})")
    
        async def run_command(self, command: str) -> str:
            """
            Simulate running a command and return the output.
    
            In a real implementation, this could execute shell commands
            or perform actions based on high-level instructions.
            """
            print(f"Simulating command: {command}")
    
            if command.startswith("open "):
                app = command[5:].strip()
                return f"Opened {app}"
            elif command.startswith("search "):
                query = command[7:].strip()
                self._current_page = f"https://bing.com/search?q={query}"
                return f"Searched for '{query}'"
            elif command.startswith("navigate "):
                url = command[9:].strip()
                self._current_page = url
                return f"Navigated to {url}"
            else:
                return f"Executed: {command}"
    
        async def get_screenshot(self) -> bytes:
            """Get a screenshot of the current screen as raw bytes."""
            return base64.b64decode(
                "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII="
            )
  • Prompt instructions for the Computer Action Agent used in the handler.
    computer_action_instructions = """You are a computer action assistant. Your primary goal is to help users perform
    actions on their computer safely and effectively.
    
    Guidelines:
    1. Always use the computer tool to perform actions
    2. Prioritize safety and security in all actions
    3. Verify user intentions before performing potentially destructive actions
    4. Provide clear feedback about actions taken
    5. If an action cannot be performed, explain why and suggest alternatives
    """
  • MCP tool registration decorator for computer_action_agent.
    @mcp.tool(
        name="computer_action_agent",
        description="Use an AI agent specialized in performing computer actions safely and effectively.",
    )
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'safely and effectively,' which hints at safety considerations but doesn't specify what makes it safe (e.g., permissions, side effects, rate limits) or describe the agent's behavior (e.g., how it performs actions, what it returns). This is inadequate for a tool that likely involves system-level operations, leaving critical behavioral traits undocumented.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that is front-loaded with the core purpose. It avoids redundancy and waste, making it appropriately concise. However, it could be more structured by explicitly separating purpose from guidelines or behavioral details, but it earns high marks for brevity and clarity within its limited scope.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a tool that performs computer actions (likely involving system interactions), the description is incomplete. With no annotations, no output schema, and minimal behavioral transparency, it fails to provide enough context for safe and effective use. The agent lacks information on what the tool returns, error handling, or operational limits, making this inadequate for such a potentially impactful tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the 'action' parameter documented as 'The action or task you want to perform on the computer.' The description adds no additional meaning beyond this, such as examples, constraints, or format details. Given the high schema coverage, a baseline score of 3 is appropriate, as the schema does the heavy lifting without extra value from the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool uses an AI agent for computer actions, which gives a vague purpose but lacks specificity about what 'computer actions' entails. It mentions 'safely and effectively' but doesn't clarify what types of actions (e.g., file operations, system commands, GUI interactions) or distinguish it from sibling tools like file_search_agent or multi_tool_agent. This is a minimal viable description that doesn't fully differentiate the tool's scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention any context, prerequisites, or exclusions for usage, nor does it reference sibling tools like file_search_agent or multi_tool_agent to help the agent choose appropriately. This leaves the agent with no explicit or implied usage rules.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lroolle/openai-agents-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server