computer_action_agent
Execute computer actions and tasks using an AI agent that operates safely and effectively through the OpenAI Agents MCP Server.
Instructions
Use an AI agent specialized in performing computer actions safely and effectively.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | The action or task you want to perform on the computer. |
Implementation Reference
- src/agents_mcp_server/server.py:263-294 (handler)The handler function for the 'computer_action_agent' MCP tool. It registers the tool, defines the input schema (action str), creates a SimpleAsyncComputer, an Agent with ComputerTool, executes it via Runner, and returns AgentResponse.@mcp.tool( name="computer_action_agent", description="Use an AI agent specialized in performing computer actions safely and effectively.", ) async def computer_action( action: str = Field(..., description="The action or task you want to perform on the computer.") ) -> AgentResponse: """Use a specialized computer action agent powered by OpenAI to perform actions on the computer.""" try: computer = SimpleAsyncComputer() agent = Agent( name="Computer Action Assistant", instructions=computer_action_instructions, tools=[ComputerTool(computer=computer)], ) with trace("Computer action agent execution"): result = await Runner.run(agent, action) return AgentResponse( response=result.final_output, raw_response={"items": [str(item) for item in result.new_items]}, ) except Exception as e: print(f"Error running computer action agent: {e}") return AgentResponse( response=f"An error occurred while performing the computer action: {str(e)}", raw_response=None, )
- Pydantic schema for the output of agent tools, including computer_action_agent.class AgentResponse(BaseModel): """Response from an OpenAI agent.""" response: str = Field(..., description="The response from the agent") raw_response: Optional[Dict[str, Any]] = Field( None, description="The raw response data from the agent, if available" )
- Custom AsyncComputer implementation used by the computer_action_agent's ComputerTool. Simulates browser/desktop actions like click, type, scroll, etc., with placeholder screenshot.class SimpleAsyncComputer(AsyncComputer): """ A simple implementation of the AsyncComputer interface that simulates computer actions. In a real implementation, you would use a browser automation library like Playwright or a system automation tool to actually perform these actions on the computer. """ def __init__(self): """Initialize the SimpleAsyncComputer.""" self._screen_width = 1024 self._screen_height = 768 self._cursor_x = 0 self._cursor_y = 0 self._current_page = "https://bing.com" @property def environment(self) -> Literal["browser", "desktop"]: """Return the environment type of this computer.""" return "browser" @property def dimensions(self) -> tuple[int, int]: """Return the dimensions of the screen.""" return (self._screen_width, self._screen_height) async def screenshot(self) -> str: """ Capture a screenshot and return it as a base64-encoded string. In a real implementation, this would capture an actual screenshot. """ placeholder_png = base64.b64decode( "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII=" ) return base64.b64encode(placeholder_png).decode("utf-8") async def click( self, x: int, y: int, button: Literal["left", "middle", "right"] = "left" ) -> None: """Simulate clicking at the specified coordinates.""" self._cursor_x = x self._cursor_y = y print(f"Simulated {button} click at ({x}, {y})") async def double_click(self, x: int, y: int) -> None: """Simulate double-clicking at the specified coordinates.""" self._cursor_x = x self._cursor_y = y print(f"Simulated double click at ({x}, {y})") async def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int) -> None: """Simulate scrolling from the specified position.""" self._cursor_x = x self._cursor_y = y print(f"Simulated scroll at ({x}, {y}) by ({scroll_x}, {scroll_y})") async def type(self, text: str) -> None: """Simulate typing the specified text.""" print(f"Simulated typing: {text}") async def wait(self) -> None: """Simulate waiting for a short period.""" await asyncio.sleep(1) print("Waited for 1 second") async def move(self, x: int, y: int) -> None: """Simulate moving the cursor to the specified coordinates.""" self._cursor_x = x self._cursor_y = y print(f"Moved cursor to ({x}, {y})") async def keypress(self, keys: list[str]) -> None: """Simulate pressing the specified keys.""" print(f"Simulated keypress: {', '.join(keys)}") async def drag(self, path: list[tuple[int, int]]) -> None: """Simulate dragging the cursor along the specified path.""" if not path: return self._cursor_x = path[0][0] self._cursor_y = path[0][1] print(f"Started drag at ({self._cursor_x}, {self._cursor_y})") for x, y in path[1:]: self._cursor_x = x self._cursor_y = y print(f"Ended drag at ({self._cursor_x}, {self._cursor_y})") async def run_command(self, command: str) -> str: """ Simulate running a command and return the output. In a real implementation, this could execute shell commands or perform actions based on high-level instructions. """ print(f"Simulating command: {command}") if command.startswith("open "): app = command[5:].strip() return f"Opened {app}" elif command.startswith("search "): query = command[7:].strip() self._current_page = f"https://bing.com/search?q={query}" return f"Searched for '{query}'" elif command.startswith("navigate "): url = command[9:].strip() self._current_page = url return f"Navigated to {url}" else: return f"Executed: {command}" async def get_screenshot(self) -> bytes: """Get a screenshot of the current screen as raw bytes.""" return base64.b64decode( "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII=" )
- Prompt instructions for the Computer Action Agent used in the handler.computer_action_instructions = """You are a computer action assistant. Your primary goal is to help users perform actions on their computer safely and effectively. Guidelines: 1. Always use the computer tool to perform actions 2. Prioritize safety and security in all actions 3. Verify user intentions before performing potentially destructive actions 4. Provide clear feedback about actions taken 5. If an action cannot be performed, explain why and suggest alternatives """
- src/agents_mcp_server/server.py:263-266 (registration)MCP tool registration decorator for computer_action_agent.@mcp.tool( name="computer_action_agent", description="Use an AI agent specialized in performing computer actions safely and effectively.", )