Leverages environment variables for configuration management, allowing customization of authentication settings, API endpoints, document storage paths, and connection parameters.
Provides tools for searching through OpenAPI (Swagger) specifications to discover API endpoints, examine their parameters, request bodies, and responses using semantic search functionality.
Browser Tools MCP Extension
🚀 Optimized for Autonomous AI-Powered Frontend Development Workflows
Browser Tools MCP Extension enables AI tools to interact with your browser for enhanced development capabilities. This document provides an overview of the available tools within the MCP server. For setup instructions, please refer to SETUP_GUIDE.md
.
Motivation
At this point in time, I think the models are capable of doing a lot of things, but they are not able to do it in a way that is helpful to the user because of a lack of context.
We humans can do tasks accurately because we have a lot of context about the task we are doing, and we can use that context to make decisions.
Too much context also makes it hard for LLMs to make decisions. So, giving the right context at the right time is very important, and this will be the key to making LLMs more helpful to the user. MCP servers are one of the ways to provide context to LLMs at the right time.
One day, I came across AgentDeskAI's repo (https://github.com/AgentDeskAI/browser-tools-mcp). This repo consisted of a Chrome extension and an MCP server. It had tools like get browser logs, get network status, etc. This inspired me, and I started using these tools in my development workflow. I came to the realization that when I am writing code, I am juggling a lot of things and managing this context so I know what to write. So, what if we can provide this context to LLMs at the right time? AgentDeskAI was a huge inspiration and starting point for this project, and that is why you will see that this is a fork of that repository. Though at this moment, I am not using most of the tools they had in their repo except the getSelectedElement
tool, they do have many interesting tools, and I am planning to use some again depending on how this setup works.
I am a Frontend Developer and Applied AI enthusiast, and I am working on this project to make already good AI coding IDEs better by creating a custom workflow on top of these tools. This workflow allows me to automate my work of frontend development and delegate the tasks to these AI IDEs, and they can autonomously work. This allows me to focus on important tasks like future-proof project setup. Oh yeah, one important thing to note is that currently, this workflow only works if the project is already set up and has basic things like auth context, API calling structure, routing, and how those routes are exposed, etc. All of this context should be set up in AI IDEs. I use Windsurf's Memories to store this context, which allows the agent to retrieve the important memories based on my prompt. You can use Cursor's Rule file also, but I don't know how well this will work because I haven't tried it.
Now, to make Frontend development autonomous, we have to understand what a frontend developer uses to code and how he/she thinks.
A frontend developer uses API documentation, browser, browser logs, browser errors, the ability to make API calls, functional requirement documents, developer tools, and his/her visual capability to see the UI and make decisions. Considering these aspects of frontend development, we can create an MCP server that can provide context to AI IDEs at the right time. So, I made tools that can access all these aspects of frontend development and provide context to AI IDEs at the right time. These tools include: analyzeApiCalls
, takeScreenshot
, getSelectedElement
, analyzeImageFile
, ingestFrdDocument
, getFrdIngestionStatus
, searchApiDocs
... and more coming soon.
I plan to make such workflows for backend and QA testers also, but primarily I am a frontend guy, so I chose this first. If you are interested in this project, please let me know, and I will be happy to help you. We can create something big and awesome.
Available Tools
The following tools are available through the Browser Tools MCP server:
analyzeApiCalls
- Description: Analyzes API interactions between the frontend and backend by retrieving filtered network request details. This tool is useful for inspecting API calls to specific endpoints, debugging network errors and status codes, examining request/response payloads, investigating authentication headers, or monitoring AJAX requests. Results include timestamps to help distinguish between identical API calls made at different times.
- Parameters:
urlFilter
(string, required): A substring or pattern to filter request URLs.details
(array of strings, required): Specific details to retrieve for each request. Possible values include:"url"
,"method"
,"status"
,"timestamp"
,"requestHeaders"
,"responseHeaders"
,"requestBody"
,"responseBody"
.timeStart
(number, optional): A Unix timestamp (in milliseconds) to filter requests that occurred after this time.timeEnd
(number, optional): A Unix timestamp (in milliseconds) to filter requests that occurred before this time.orderBy
(string, optional, default:"timestamp"
): The field to order results by. Possible values:"timestamp"
,"url"
.orderDirection
(string, optional, default:"desc"
): The direction for ordering. Possible values:"asc"
(oldest first),"desc"
(newest first).limit
(number, optional, default:20
): The maximum number of results to return.
- Functionality: This tool constructs a query based on the provided parameters and fetches network request details from the
browser-connector
server (typically athttp://<host>:<port>/network-request-details
). It then returns the filtered and ordered list of network interactions.
takeScreenshot
⭐ ENHANCED- Description: Take a screenshot of the current browser tab and return the image data for immediate analysis. The screenshot is automatically organized by project and URL structure in a centralized directory system.
- Parameters:
filename
(string, optional): Optional custom filename for the screenshot (without extension). If not provided, uses timestamp-based naming.returnImageData
(boolean, optional, default: true): Whether to return the base64 image data in the response for immediate analysis.projectName
(string, optional): Optional project name to override automatic project detection. Screenshots will be organized under this project folder.
- Functionality: Captures a screenshot via the Chrome extension with enhanced connection stability. Features 15-second timeout for autonomous operation reliability and organized storage system. Returns both file confirmation and base64 image data (if requested) for immediate analysis workflows.
getSelectedElement
- Description: Retrieves information about the HTML element currently selected by the user in the browser's DevTools (if any).
- Parameters: None.
- Functionality: This tool queries the
browser-connector
server (athttp://<host>:<port>/selected-element
) to get details of the element last inspected or selected by the user in the Chrome DevTools. It returns a JSON string containing information about the selected element.
analyzeImageFile
- Description: Load and analyze previously saved images or existing image files. Use this to access historical screenshots taken with takeScreenshot or any other image files in your project.
- Parameters:
imagePath
(string, required): The path to the image file. This can be an absolute path or a path relative to the project root.projectRoot
(string, optional): An optional path to override the default project root directory. If not provided, it uses thePROJECT_ROOT
environment variable or the directory of the MCP server.
- Functionality: The tool resolves the absolute path to the image, reads the file, converts its content to a base64 string, and determines its MIME type. It returns an object containing the
fileName
,mimeType
,size
(in bytes), and thebase64Data
of the image.
ingestFrdDocument
- Description: Takes a path to a Functional Requirements Document (FRD) or similar document (TXT, MD, CSV, PDF), processes it using LlamaIndex, and ingests its content into a Qdrant vector database for semantic search and analysis. This is an asynchronous operation.
- Parameters:
documentPath
(string, required): The path to the document file.projectRoot
(string, optional): Optional override for the project root directory to resolve relative document paths.collectionName
(string, optional, default:"frd_documents"
): The name of the Qdrant collection to use.qdrantUrl
(string, optional): The URL of the Qdrant server. Defaults toprocess.env.QDRANT_URL
orhttp://localhost:6333
.qdrantApiKey
(string, optional): The API key for Qdrant Cloud. Defaults toprocess.env.QDRANT_API_KEY
.vectorSize
(number, optional, default:768
): The size of the vectors for embeddings (default is for Gemini text-embedding-004).
- Functionality:
- Generates a unique task ID for tracking the ingestion process.
- Resolves the absolute path to the document.
- Asynchronously, it uses
LlamaParseReader
(from LlamaIndex) to parse the document. For PDF files, it's configured to extract text and describe images within the resulting markdown. - It then creates embeddings (using Google's Gemini model, requires
GOOGLE_API_KEY
) and stores them in the specified Qdrant collection. - If the Qdrant collection doesn't exist, it attempts to create it.
- The tool immediately returns the
taskId
and the initial status. The actual ingestion happens in the background. You can usegetFrdIngestionStatus
to check the progress.
getFrdIngestionStatus
- Description: Retrieves the current status of an FRD document ingestion task previously initiated by
ingestFrdDocument
. - Parameters:
taskId
(string, required): The unique ID of the ingestion task.
- Functionality: It checks the internal
ingestionTasks
store for the status of the task associated with the giventaskId
. It returns details such as the currentstatus
(e.g., "STARTED", "PROCESSING", "COMPLETED", "FAILED"), anymessage
,startTime
,endTime
,documentPath
, andcollectionName
.
- Description: Retrieves the current status of an FRD document ingestion task previously initiated by
searchApiDocs
- Description: Searches through an OpenAPI (Swagger) specification to find API endpoints that match a given pattern. This helps in understanding API structures, parameters, and responses.
- Parameters:
swaggerSource
(string, required): The source of the Swagger/OpenAPI specification. This can be a URL, a local file path, or a JSON string containing the specification. Defaults to theSWAGGER_URL
environment variable if not provided.apiPattern
(string, required): A regular expression pattern to match against API paths oroperationId
s.includeSchemas
(boolean, optional, default:true
): If true, the tool will attempt to resolve and include the full schema definitions for parameters, request bodies, and responses referenced via$ref
.
- Functionality:
- Loads the OpenAPI specification from the
swaggerSource
. - Iterates through all defined paths and operations in the specification.
- Matches the
apiPattern
against the endpoint path and itsoperationId
. - For matching endpoints, it extracts details like the HTTP method, summary, description, parameters, request body, and responses.
- If
includeSchemas
is true, it resolves and embeds any referenced JSON schemas directly into the output for the matching endpoints.
- Loads the OpenAPI specification from the
executeAuthenticatedApiCall
(NEW - Unified API Testing Tool)- Description: Automatically retrieves authentication tokens from browser session and executes authenticated API calls. This eliminates token retrieval hallucination and ensures consistent API testing with real authentication.
- Parameters:
endpoint
(string, required): The API endpoint path (e.g., '/api/users', '/auth/profile'). Combined with API_BASE_URL from environment.method
(enum, optional, default: "GET"): HTTP method for the API call (GET, POST, PUT, PATCH, DELETE).requestBody
(any, optional): Request body for POST/PUT/PATCH requests (automatically JSON stringified).queryParams
(object, optional): Query parameters as key-value pairs.additionalHeaders
(object, optional): Additional headers to include in the request.includeResponseDetails
(boolean, optional, default: true): Whether to include detailed response analysis (status, headers, timing).
- Environment Variables Required:
AUTH_ORIGIN
: The origin where your app is running (e.g., "http://localhost:5173")AUTH_STORAGE_TYPE
: Where the auth token is stored ("cookie", "localStorage", or "sessionStorage")AUTH_TOKEN_KEY
: The key name for the auth token (e.g., "authToken", "accessToken")API_BASE_URL
: Your API base URL (e.g., "https://api.example.com")
- Functionality:
- Automatically retrieves auth token from browser session using predefined environment configuration
- Constructs full API URL and adds query parameters if provided
- Makes authenticated API request with proper Authorization header
- Returns structured response with actual API data and optional detailed metrics
- Eliminates manual token handling and curl command execution
getAccessToken
(DEPRECATED)- Description: Legacy tool for manual token retrieval. Use
executeAuthenticatedApiCall
instead for better reliability. - Status: Kept for backward compatibility but deprecated in favor of the unified approach.
- Returns a JSON string containing an array of the matching API endpoint details.
- Description: Legacy tool for manual token retrieval. Use
🤖 Autonomous Operation Features
Enhanced Connection Stability
- Intelligent Heartbeat System: 25-second intervals with 60-second timeouts
- Fast Recovery: 3-15 second reconnection times for minimal workflow disruption
- Exponential Backoff: Smart retry logic with up to 10 attempts
- Individual Request Tracking: Prevents callback conflicts during concurrent operations
- Connection Health Monitoring: Real-time status endpoint at
/connection-health
Autonomous AI Workflow Optimizations
- Extended Screenshot Timeouts: 15-second timeouts for network tolerance
- Enhanced Error Handling: Detailed connection state reporting for debugging
- Streamlined Discovery: Essential IP scanning (300ms timeouts) for faster server detection
- Background Retry Logic: 5 retry attempts with server validation
- Network Tolerance: Increased timeouts for unreliable network conditions
Connection Health API
Access real-time connection status at: http://localhost:3026/connection-health
See SETUP_GUIDE.md
for detailed configuration instructions and AUTONOMOUS_OPERATION_TESTING_REPORT.md
for testing results.
Environment Variables
The server supports several environment variables for configuration:
API Testing & Authentication
AUTH_ORIGIN
: Origin where your app runs (e.g., "http://localhost:5173")AUTH_STORAGE_TYPE
: Token storage location ("cookie", "localStorage", "sessionStorage")AUTH_TOKEN_KEY
: Token key name (e.g., "authToken", "accessToken")API_BASE_URL
: Your API base URL (e.g., "https://api.example.com")
Document & API Discovery
SWAGGER_URL
: Swagger/OpenAPI JSON URL for API documentation searchPROJECT_ROOT
: Project root directory for file operations and image analysis
Screenshot Management
SCREENSHOT_STORAGE_PATH
: Custom directory for screenshot storage (defaults to Downloads folder)
Vector Database (for FRD document ingestion)
GOOGLE_API_KEY
: Google API key for embeddingsQDRANT_API_KEY
: Qdrant vector database API keyQDRANT_URL
: Qdrant server URL (defaults to http://localhost:6333)
Connection Stability & Autonomous Operation
BROWSER_TOOLS_HOST
: Server host override (defaults to "127.0.0.1")BROWSER_TOOLS_PORT
: Server port override (defaults to 3025)
This server cannot be installed
Enables AI tools to interact with your browser for enhanced frontend development, providing context-rich capabilities like API call analysis, screenshot capture, element inspection, and API testing with automatic authentication.
Related MCP Servers
- AsecurityFlicenseAqualityEnables AI agents to interact with web browsers using natural language, featuring automated browsing, form filling, vision-based element detection, and structured JSON responses for systematic browser control.Last updated -147Python
- -securityAlicense-qualityEnables browser automation and real-time computer vision tasks through AI-driven commands, offering zero-cost digital navigation and interaction for enhanced web experiences.Last updated -01JavaScriptMIT License
- -securityFlicense-qualityProvides browser automation capabilities through an API endpoint that interprets natural language commands to perform web tasks using OpenAI's GPT models.Last updated -Python
- -securityAlicense-qualityA browser monitoring and interaction tool that enables AI applications to capture and analyze browser data through a Chrome extension, supporting functions like console monitoring, screenshots, DOM analysis, and website auditing.Last updated -1JavaScriptMIT License