on_page_content_parsing
Extract and analyze structured content from any webpage, including links, anchors, headings, and text, with JavaScript rendering and custom settings for precise data retrieval.
Instructions
This endpoint allows parsing the content on any page you specify and will return the structured content of the target page, including link URLs, anchors, headings, and textual content.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| accept_language | No | Accept-Language header value | |
| custom_js | No | Custom JavaScript code to execute | |
| custom_user_agent | No | Custom User-Agent header | |
| enable_javascript | No | Enable JavaScript rendering | |
| url | Yes | URL of the page to parse |
Implementation Reference
- The main handler function for the 'on_page_content_parsing' tool. It makes a POST request to the DataForSEO '/v3/on_page/content_parsing/live' endpoint with the provided parameters and processes the response, returning formatted markdown content or error.async handle(params: { url: string; enable_javascript?: boolean; custom_js?: string; custom_user_agent?: string; accept_language?: string; }): Promise<any> { try { const response = await this.dataForSEOClient.makeRequest('/v3/on_page/content_parsing/live', 'POST', [{ url: params.url, enable_javascript: params.enable_javascript, custom_js: params.custom_js, custom_user_agent: params.custom_user_agent, accept_language: params.accept_language, markdown_view: true }]); console.error(JSON.stringify(response)); if(defaultGlobalToolConfig.fullResponse || this.supportOnlyFullResponse()){ let data = response as DataForSEOFullResponse; this.validateResponseFull(data); let result = data.tasks[0].result; return this.formatResponse(result); } else{ let data = response as DataForSEOResponse; this.validateResponse(data); let result = data.items[0].page_as_markdown; return this.formatResponse(result); } } catch (error) { return this.formatErrorResponse(error); } }
- Defines the Zod schema for input parameters of the 'on_page_content_parsing' tool, including url (required), and optional flags for JS rendering, custom JS, UA, and language.getParams(): z.ZodRawShape { return { url: z.string().describe("URL of the page to parse"), enable_javascript: z.boolean().optional().describe("Enable JavaScript rendering"), custom_js: z.string().optional().describe("Custom JavaScript code to execute"), custom_user_agent: z.string().optional().describe("Custom User-Agent header"), accept_language: z.string().optional().describe("Accept-Language header value"), }; }
- src/core/modules/onpage/onpage-api.module.ts:6-21 (registration)Registers the ContentParsingTool (which provides 'on_page_content_parsing') in the OnPageApiModule by instantiating it and including in the tools record with name, description, params, and handler wrapper.getTools(): Record<string, ToolDefinition> { const tools = [ new ContentParsingTool(this.dataForSEOClient), new InstantPagesTool(this.dataForSEOClient), // Add more tools here ]; return tools.reduce((acc, tool) => ({ ...acc, [tool.getName()]: { description: tool.getDescription(), params: tool.getParams(), handler: (params: any) => tool.handle(params), }, }), {}); }