get_content
Extract rendered HTML content from webpages for web scraping and content analysis. Use this tool to retrieve fully loaded page content with options to wait for specific elements or conditions before extraction.
Instructions
Extract rendered HTML content from a webpage
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| waitForSelector | No | ||
| waitForFunction | No |
Implementation Reference
- src/index.ts:345-368 (handler)Primary MCP server handler for the 'get_content' tool. Validates arguments, calls BrowserlessClient.getContent, and formats the response as MCP content blocks including extracted HTML.case 'get_content': { if (!args) throw new Error('Arguments are required'); const result = await this.client!.getContent(args as any); if (result.success && result.data) { return { content: [ { type: 'text', text: `Content extracted successfully from ${result.data.url}`, }, { type: 'text', text: `Title: ${result.data.title}`, }, { type: 'text', text: result.data.html, }, ], }; } else { throw new Error(result.error || 'Failed to get content'); } }
- src/client.ts:113-124 (helper)BrowserlessClient helper method that implements the core logic by making an HTTP POST request to the Browserless server '/content' endpoint to extract webpage content.async getContent(request: ContentRequest): Promise<BrowserlessResponse<ContentResponse>> { try { const response: AxiosResponse<ContentResponse> = await this.httpClient.post('/content', request); return { success: true, data: response.data, }; } catch (error) { return this.handleError(error); } }
- src/index.ts:110-134 (registration)Tool registration in the ListTools response, defining the name, description, and input schema for 'get_content'.{ name: 'get_content', description: 'Extract rendered HTML content from a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string' }, waitForSelector: { type: 'object', properties: { selector: { type: 'string' }, timeout: { type: 'number' }, }, }, waitForFunction: { type: 'object', properties: { fn: { type: 'string' }, timeout: { type: 'number' }, }, }, }, required: ['url'], }, },
- src/types.ts:151-166 (schema)Zod schema definition for ContentRequest type used in getContent requests, providing detailed input validation.export const ContentRequestSchema = z.object({ url: z.string(), gotoOptions: z.object({ waitUntil: z.string().optional(), timeout: z.number().optional(), }).optional(), waitForSelector: WaitForSelectorSchema.optional(), waitForFunction: WaitForFunctionSchema.optional(), waitForTimeout: z.number().optional(), addScriptTag: z.array(ScriptTagSchema).optional(), headers: z.record(z.string()).optional(), cookies: z.array(CookieSchema).optional(), viewport: ViewportSchema.optional(), }); export type ContentRequest = z.infer<typeof ContentRequestSchema>;
- src/simple-server.ts:91-110 (handler)Alternative simple MCP server handler for 'get_content' using direct axios call to Browserless /content endpoint.case 'get_content': { if (!args?.url) throw new Error('URL is required'); const response = await axios.post(`${this.browserlessUrl}/content`, { url: args.url, ...(args.waitForSelector ? { waitForSelector: args.waitForSelector } : {}), }, { timeout: 15000 }); return { content: [ { type: 'text', text: `Content extracted successfully from ${args.url}`, }, { type: 'text', text: response.data, }, ], }; }