MCP Operator
by willer
Verified
# MCP Operator Requirements
## Overview
The MCP Operator is a tool that provides browser automation capabilities to LLMs through the MCP (Model Control Protocol) interface. It enables AI models to control a web browser, interact with web pages, and analyze web content.
## Core Components
### Browser Operator
- Manages browser instances through the OpenAI Computer Use API
- Orchestrates browser automation via Playwright
- Handles creation, navigation, and operation of browser instances
### MCP Server
- Implements the MCP protocol for communication with LLMs
- Exposes browser automation tools through a standardized JSON-RPC interface
- Maintains state for browser sessions
## Functional Requirements
### Browser Management
- **Create Browser**: Initialize a new browser instance with persistent state
- **Navigate Browser**: Direct the browser to a specified URL
- **Operate Browser**: Execute natural language instructions for browser interaction
- **Close Browser**: Terminate a browser instance
### Job Management
- **Get Job Status**: Retrieve the status and result of an operation by job ID
- **List Jobs**: View recent browser operation jobs
### Web Interaction
- **Browser Operation**: Ability to follow natural language instructions to interact with web content
- **Project Persistence**: Maintain browser state across sessions with project identifiers
### Additional Playwright Operations
- **Take Screenshot**: Capture the current browser viewport
- **Get Console Logs**: Retrieve browser console output for debugging
- **Get Console Errors**: Capture error messages from the browser console
- **Get Network Logs**: Monitor network activity and request/response data
- **Get Network Errors**: Track failed network requests
- **Scroll To**: Programmatically scroll to specific coordinates on the page
- **Click**: Interact with page elements through mouse clicks
- **Type**: Input text into form fields and other input elements
### Browser Debugging Tools
- **Run Accessibility Audit**: Evaluate page compliance with accessibility standards
- **Run Performance Audit**: Measure page load times and optimization metrics
- **Run SEO Audit**: Analyze page structure for search engine optimization
- **Run NextJS Audit**: Specific auditing for NextJS applications
- **Run Best Practices Audit**: Check adherence to web development best practices
- **Run Debugger Mode**: Advanced debugging interface for troubleshooting
- **Run Audit Mode**: Comprehensive page evaluation for multiple metrics
### User Notes
- **Add Note**: Create and store notes related to browser operations
## Non-functional Requirements
### Performance
- Handle browser operations with appropriate timeouts
- Manage memory usage efficiently, especially for screenshots
- Support concurrent browser instances
### Reliability
- Gracefully handle errors in browser operations
- Provide proper recovery mechanisms for failed operations
- Implement safeguards against navigating to dangerous websites
### Logging
- File-based logging with no console output (to preserve JSON-RPC communication)
- Comprehensive error reporting
- Multiple log levels for different operational needs
### Security
- Session isolation between different browser instances
- Secure handling of web content
- Protection against malicious websites
## Development Standards
### Testing
- Test-driven development approach
- Integration tests with real APIs (no mocks)
- Support for multi-step task testing
### Code Quality
- Type annotations for all functions
- Comprehensive documentation
- Adherence to PEP 8 standards
- Proper error handling
- Modular and maintainable code structure
## Communication Protocol
### JSON-RPC Interface
- Standard MCP protocol compliance
- Clean stdin/stdout channels for communication
- Structured error responses
- Asynchronous job handling with status tracking