webscraping_ai_text

Scrape text from any web page using configurable options like JavaScript execution, proxy type, and output format.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes	URL of the target page.
`text_format`	No	Format of the text response.	json
`return_links`	No	Return links from the page body text.
`timeout`	No	Maximum web page retrieval time in ms (20000 by default, maximum is 30000).
`js`	No	Execute on-page JavaScript using a headless browser (false by default).
`js_timeout`	No	Maximum JavaScript rendering time in ms (3000 by default).
`wait_for`	No	CSS selector to wait for before returning the page content.
`proxy`	No	Type of proxy: datacenter, residential, or stealth (datacenter by default). Use residential if the site restricts datacenter traffic, or stealth for the most heavily protected sites with advanced anti-bot detection. Residential and stealth requests cost more than datacenter — see the pricing page.	datacenter
`country`	No	Country of the proxy to use (US by default).
`custom_proxy`	No	Your own proxy URL in "http://user:password@host:port" format.
`device`	No	Type of device emulation.
`error_on_404`	No	Return error on 404 HTTP status on the target page (false by default).
`error_on_redirect`	No	Return error on redirect on the target page (false by default).
`js_script`	No	Custom JavaScript code to execute on the target page.

Implementation Reference

src/index.js:293-308 (handler)

The async handler function that executes the webscraping_ai_text tool logic. It calls client.text() with url, text_format, return_links, and common options, then sanitizes and returns the response.

async ({ url, text_format, return_links, ...options }) => {
  try {
    const result = await client.text(url, {
      ...options,
      text_format,
      return_links
    });

    const content = typeof result === 'object' ? JSON.stringify(result) : result;

    return createSanitizedResponse(content, url);
  } catch (error) {
    const errorObj = JSON.parse(error.message);
    return createSanitizedResponse(JSON.stringify(errorObj), url, true);
  }
}

src/index.js:287-292 (schema)

The input schema/validation for webscraping_ai_text using Zod. Defines url (string), text_format (optional enum: plain/xml/json, default json), return_links (optional boolean), plus common options from commonOptionsSchema.

{
  url: z.string().describe('URL of the target page.'),
  text_format: z.enum(['plain', 'xml', 'json']).optional().default('json').describe('Format of the text response.'),
  return_links: z.boolean().optional().describe('Return links from the page body text.'),
  ...commonOptionsSchema
},

src/index.js:285-309 (registration)

Registration of the 'webscraping_ai_text' tool with the MCP server via server.tool().

server.tool(
  'webscraping_ai_text',
  {
    url: z.string().describe('URL of the target page.'),
    text_format: z.enum(['plain', 'xml', 'json']).optional().default('json').describe('Format of the text response.'),
    return_links: z.boolean().optional().describe('Return links from the page body text.'),
    ...commonOptionsSchema
  },
  async ({ url, text_format, return_links, ...options }) => {
    try {
      const result = await client.text(url, {
        ...options,
        text_format,
        return_links
      });

      const content = typeof result === 'object' ? JSON.stringify(result) : result;

      return createSanitizedResponse(content, url);
    } catch (error) {
      const errorObj = JSON.parse(error.message);
      return createSanitizedResponse(JSON.stringify(errorObj), url, true);
    }
  }
);

src/index.js:97-102 (helper)
The client.text() helper method on WebScrapingAIClient class that calls the API /text endpoint.
```
async text(url, options = {}) {
  return this.request('/text', {
    url,
    ...options
  });
}
```

src/index.js:192-207 (helper)

The createSanitizedResponse helper function used to format the tool's output, applying content sandboxing if enabled.

function createSanitizedResponse(content, url, isError = false) {
  if (isError) {
    return {
      content: [{ type: 'text', text: content }],
      isError: true
    };
  }

  // Process the content (apply sandboxing if enabled)
  const result = sanitizer.sanitize(content, { url });

  // Create response
  return {
    content: [{ type: 'text', text: result.content }]
  };
}

WebScraping-AI MCP Server

webscraping_ai_text

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API