browserbase_stagehand_extract
Extract structured data and text content from web pages using specific instructions and JSON schemas for scraping, information gathering, or content retrieval.
Instructions
Extracts structured information and text content from the current web page based on specific instructions and a defined schema. This tool is ideal for scraping data, gathering information, or pulling specific content from web pages. Use this tool when you need to get text content, data, or information from a page rather than interacting with elements. For interactive elements like buttons, forms, or clickable items, use the observe tool instead. The extraction works best when you provide clear, specific instructions about what to extract and a well-defined JSON schema for the expected output format. This ensures the extracted data is properly structured and usable.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| instruction | Yes | The specific instruction for what information to extract from the current page. Be as detailed and specific as possible about what you want to extract. For example: 'Extract all product names and prices from the listing page' or 'Get the article title, author, and publication date from this blog post'. The more specific your instruction, the better the extraction results will be. Avoid vague instructions like 'get everything' or 'extract the data'. Instead, be explicit about the exact elements, text, or information you need. |
Implementation Reference
- src/tools/extract.ts:34-62 (handler)The main handler function that executes the extraction logic using stagehand.page.extract() based on the provided instruction.async function handleExtract( context: Context, params: ExtractInput, ): Promise<ToolResult> { const action = async (): Promise<ToolActionResult> => { try { const stagehand = await context.getStagehand(); const extraction = await stagehand.page.extract(params.instruction); return { content: [ { type: "text", text: `Extracted content:\n${JSON.stringify(extraction, null, 2)}`, }, ], }; } catch (error) { const errorMsg = error instanceof Error ? error.message : String(error); throw new Error(`Failed to extract content: ${errorMsg}`); } }; return { action, waitForNetwork: false, }; }
- src/tools/extract.ts:6-32 (schema)Defines the input schema (ExtractInputSchema) and the tool schema (extractSchema) including name, description, and inputSchema.const ExtractInputSchema = z.object({ instruction: z .string() .describe( "The specific instruction for what information to extract from the current page. " + "Be as detailed and specific as possible about what you want to extract. For example: " + "'Extract all product names and prices from the listing page' or 'Get the article title, " + "author, and publication date from this blog post'. The more specific your instruction, " + "the better the extraction results will be. Avoid vague instructions like 'get everything' " + "or 'extract the data'. Instead, be explicit about the exact elements, text, or information you need.", ), }); type ExtractInput = z.infer<typeof ExtractInputSchema>; const extractSchema: ToolSchema<typeof ExtractInputSchema> = { name: "browserbase_stagehand_extract", description: "Extracts structured information and text content from the current web page based on specific instructions " + "and a defined schema. This tool is ideal for scraping data, gathering information, or pulling specific " + "content from web pages. Use this tool when you need to get text content, data, or information from a page " + "rather than interacting with elements. For interactive elements like buttons, forms, or clickable items, " + "use the observe tool instead. The extraction works best when you provide clear, specific instructions " + "about what to extract and a well-defined JSON schema for the expected output format. This ensures " + "the extracted data is properly structured and usable.", inputSchema: ExtractInputSchema, };
- src/tools/extract.ts:64-70 (registration)Registers the tool by creating the extractTool object with schema and handle, and exporting it as default.const extractTool: Tool<typeof ExtractInputSchema> = { capability: "core", schema: extractSchema, handle: handleExtract, }; export default extractTool;