navigate_and_scrape
Navigate to a URL and extract content in one operation. Supports text, HTML, markdown, links, images, and screenshots.
Instructions
Navigate to a URL and optionally scrape content in one operation. Auto-creates session if needed.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | No | Browser session ID to use. If not provided and auto_create_session is true, a new session will be created automatically | |
| url | Yes | URL to navigate to. Must be a valid HTTP/HTTPS URL | |
| wait_until | No | Navigation wait condition: 'load' waits for all resources, 'domcontentloaded' waits for DOM (faster), 'networkidle' waits for network activity to stop | domcontentloaded |
| timeout | No | Navigation timeout in milliseconds. Increase for slow-loading pages | |
| extract_text | No | Whether to extract text content from the page. Useful for content analysis and AI processing | |
| extract_html | No | Whether to extract raw HTML content. Useful for detailed page analysis or when text extraction isn't sufficient | |
| extract_sanitized_html | No | Whether to extract sanitized HTML content with scripts and styles removed. Safer for AI processing | |
| extract_markdown | No | Whether to convert page content to markdown format. Clean format for AI analysis | |
| extract_dom_json | No | Whether to extract DOM structure as navigable JSON with inline styles. Enables AI navigation of page structure | |
| extract_links | No | Whether to extract all links from the page. Returns array of {text, href} objects | |
| extract_images | No | Whether to extract all images from the page. Returns array of {alt, src} objects | |
| capture_screenshot | No | Whether to capture a screenshot of the page and store as base64. Enables AI visual analysis | |
| screenshot_full_page | No | Whether to capture full page screenshot or just viewport. Only used if capture_screenshot is true | |
| auto_index_website | No | Whether to automatically index this page in the website database for future reference | |
| selector | No | CSS selector to limit extraction to specific elements. If provided, only content within matching elements will be extracted | |
| wait_for_selector | No | CSS selector to wait for before extracting content. Useful for dynamic content that loads after navigation | |
| auto_create_session | No | Whether to automatically create a new session if session_id is not provided. Convenient for one-off operations | |
| browser_type | No | Browser engine to use when creating a new session (only used if auto_create_session is true) |