Skip to main content
Glama
robot-resources

Robot Resources Scraper

scraper_crawl_url

Crawl multiple web pages from a starting URL using BFS link discovery and return compressed markdown with 70-90% fewer tokens than raw HTML.

Instructions

Crawl multiple pages from a starting URL using BFS link discovery. Returns compressed markdown for each page with 70-90% fewer tokens than raw HTML.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesStarting URL to crawl
maxPagesNoMax pages to crawl (default: 10)
maxDepthNoMax link depth (default: 2)
modeNoFetch mode: 'fast' (plain HTTP), 'stealth' (TLS fingerprint), 'render' (headless browser), 'auto' (fast with fallback). Default: 'auto'
includeNoURL patterns to include (glob)
excludeNoURL patterns to exclude (glob)
timeoutNoPer-page timeout in milliseconds (default: 10000)

Implementation Reference

  • The handler function for the `scraper_crawl_url` tool, which orchestrates the crawling process using the `crawl` function from `@robot-resources/scraper`.
    export async function crawlUrl({
      url,
      maxPages,
      maxDepth,
      mode,
      include,
      exclude,
      timeout,
    }: {
      url: string;
      maxPages?: number;
      maxDepth?: number;
      mode?: FetchMode;
      include?: string[];
      exclude?: string[];
      timeout?: number;
    }) {
      try {
        const result = await crawl({
          url,
          limit: maxPages ?? 10,
          depth: maxDepth ?? 2,
          mode,
          include,
          exclude,
          timeout,
        });
    
        const host = new URL(url).host;
        const errorSuffix = result.errors.length > 0
          ? ` (${result.errors.length} error${result.errors.length > 1 ? 's' : ''})`
          : '';
        const summary = `Crawled ${result.totalCrawled} pages from ${host}${errorSuffix}`;
    
        const content: Array<{ type: 'text'; text: string }> = [
          { type: 'text' as const, text: summary },
        ];
    
        for (const page of result.pages) {
          const header = page.title ? `## ${page.title}\n\n` : '';
          content.push({
            type: 'text' as const,
            text: `${header}${page.markdown}`,
          });
        }
    
        return {
          content,
          structuredContent: {
            pages: result.pages,
            totalCrawled: result.totalCrawled,
            totalDiscovered: result.totalDiscovered,
            totalSkipped: result.totalSkipped,
            errors: result.errors,
            duration: result.duration,
          },
        };
      } catch (error) {
        return formatError(url, error);
      }
    }
  • src/server.ts:50-88 (registration)
    Registration of the `scraper_crawl_url` tool in the MCP server, including schema definition using Zod.
    server.tool(
      'scraper_crawl_url',
      'Crawl multiple pages from a starting URL using BFS link discovery. Returns compressed markdown for each page with 70-90% fewer tokens than raw HTML.',
      {
        url: z.string().url().describe('Starting URL to crawl'),
        maxPages: z
          .number()
          .int()
          .min(1)
          .max(100)
          .optional()
          .describe('Max pages to crawl (default: 10)'),
        maxDepth: z
          .number()
          .int()
          .min(0)
          .max(5)
          .optional()
          .describe('Max link depth (default: 2)'),
        mode: z
          .enum(['fast', 'stealth', 'render', 'auto'])
          .optional()
          .describe("Fetch mode: 'fast' (plain HTTP), 'stealth' (TLS fingerprint), 'render' (headless browser), 'auto' (fast with fallback). Default: 'auto'"),
        include: z
          .array(z.string())
          .optional()
          .describe('URL patterns to include (glob)'),
        exclude: z
          .array(z.string())
          .optional()
          .describe('URL patterns to exclude (glob)'),
        timeout: z
          .number()
          .positive()
          .optional()
          .describe('Per-page timeout in milliseconds (default: 10000)'),
      },
      async (args) => crawlUrl(args),
    );
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/robot-resources/scraper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server