Browser Use Heroku

agent-settings.mdx•7.58 KiB

--- title: "Agent Settings" description: "Learn how to configure the agent" icon: "gear" --- ## Overview The `Agent` class is the core component of Browser Use that handles browser automation. Here are the main configuration options you can use when initializing an agent. ## Basic Settings ```python from browser_use import Agent from browser_use.llm import ChatOpenAI agent = Agent( task="Search for latest news about AI", llm=ChatOpenAI(model="gpt-4o"), ) ``` ### Required Parameters - `task`: The instruction for the agent to execute - `llm`: A chat model instance. See <a href="/customize/supported-models">Supported Models</a> for supported models. ## Agent Behavior Control how the agent operates: ```python agent = Agent( task="your task", llm=llm, controller=custom_controller, # For custom tool calling use_vision=True, # Enable vision capabilities save_conversation_path="logs/conversation" # Save chat logs ) ``` ### Behavior Parameters - `controller`: Registry of functions the agent can call. Defaults to base Controller. See <a href="/customize/custom-functions">Custom Functions</a> for details. - `use_vision`: Enable/disable vision capabilities. Defaults to `True`. - When enabled, the model processes visual information from web pages - Disable to reduce costs or use models without vision support - For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size) - `vision_detail_level`: Controls the detail level of screenshots sent to the vision model. Can be `'low'`, `'high'`, or `'auto'` (default). Using `'low'` can significantly reduce token consumption and cost for simpler visual tasks, while `'high'` provides more detail for complex visual analysis. - `save_conversation_path`: Path to save the complete conversation history. Useful for debugging. - `override_system_message`: Completely replace the default system prompt with a custom one. - `extend_system_message`: Add additional instructions to the default system prompt. <Note> Vision capabilities are recommended for better web interaction understanding, but can be disabled to reduce costs or when using models without vision support. </Note> ### Reuse Existing Browser Context By default browser-use launches its own builtin browser using playwright chromium. You can also connect to a remote browser or pass any of the following existing playwright objects to the Agent: `page`, `browser_context`, `browser`, `browser_session`, or `browser_profile`. These all get passed down to create a `BrowserSession` for the `Agent`: ```python agent = Agent( task='book a flight to fiji', llm=llm, browser_profile=browser_profile, # use this profile to create a BrowserSession browser_session=BrowserSession( # use an existing BrowserSession cdp_url=..., # remote CDP browser to connect to # or wss_url=..., # remote wss playwright server provider # or browser_pid=... # pid of a locally running browser process to attach to # or executable_path=... # provide a custom chrome binary path # or channel=... # specify chrome, chromium, ms-edge, etc. # or page=page, # use an existing playwright Page object # or browser_context=browser_context, # use an existing playwright BrowserContext object # or browser=browser, # use an existing playwright Browser object ), ) ``` For example, to connect to an existing browser over CDP you could do: ```python agent = Agent( ... browser_session=BrowserSession(cdp_url='http://localhost:9222'), ) ``` For example, to connect to a local running chrome instance you can do: ```python agent = Agent( ... browser_session=BrowserSession(browser_pid=1234), ) ``` See <a href="/customize/real-browser">Connect to your Browser</a> for more info. <Note> You can reuse the same `BrowserSession` after an agent has completed running. If you do nothing, the browser will be automatically closed on `run()` completion only if it was launched by us. </Note> ## Running the Agent The agent is executed using the async `run()` method: - `max_steps` (default: `100`) Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time. ## Agent History The method returns an `AgentHistoryList` object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts. ```python # Example of accessing history history = await agent.run() # Access (some) useful information history.urls() # List of visited URLs history.screenshot_paths() # List of screenshot paths history.action_names() # Names of executed actions history.extracted_content() # Content extracted during execution history.errors() # Any errors that occurred history.model_actions() # All actions with their parameters ``` The `AgentHistoryList` provides many helper methods to analyze the execution: - `final_result()`: Get the final extracted content - `is_done()`: Check if the agent completed successfully - `has_errors()`: Check if any errors occurred - `model_thoughts()`: Get the agent's reasoning process - `action_results()`: Get results of all actions <Note> For a complete list of helper methods and detailed history analysis capabilities, refer to the [AgentHistoryList source code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L111). </Note> ## Run initial actions without LLM With [this example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py) you can run initial actions without the LLM. Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the [Controller](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) source code. ```python initial_actions = [ {'go_to_url': {'url': 'https://www.google.com', 'new_tab': True}}, {'go_to_url': {'url': 'https://en.wikipedia.org/wiki/Randomness', 'new_tab': True}}, {'scroll_down': {'amount': 1000}}, ] agent = Agent( task='What theories are displayed on the page?', initial_actions=initial_actions, llm=llm, ) ``` ### Optional Parameters - `initial_actions`: List of initial actions to run before the main task. - `max_actions_per_step`: Maximum number of actions to run in a step. Defaults to `10`. - `max_failures`: Maximum number of failures before giving up. Defaults to `3`. - `retry_delay`: Time to wait between retries in seconds when rate limited. Defaults to `10`. - `generate_gif`: Enable/disable GIF generation. Defaults to `False`. Set to `True` or a string path to save the GIF. ## Memory Memory management in browser-use has been significantly improved since version 0.3.2. The agent's context handling and state management are now robust enough that the previous memory system (`mem0`) is no longer needed or supported. The agent maintains its context and task progress through: - Detailed history tracking of actions and results - Structured state management - Clear goal setting and evaluation at each step The `enable_memory` parameter has been removed as the new system provides better context management by default. <Note> If you're upgrading from an older version that used `enable_memory`, simply remove this parameter. The agent will automatically use the improved context management system. </Note>

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dsouza-anush/browser-use-heroku'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

agent-settings.mdx•7.58 KiB