Skip to main content
Glama
by joerup

crawling_exa

Extract full text content, metadata, and structured information from specific URLs using advanced web crawling capabilities. Ideal for detailed content retrieval from known web pages.

Instructions

Extract and crawl content from specific URLs using Exa AI - retrieves full text content, metadata, and structured information from web pages. Ideal for extracting detailed content from known URLs.

Input Schema

NameRequiredDescriptionDefault
maxCharactersNoMaximum characters to extract (default: 3000)
urlYesURL to crawl and extract content from

Input Schema (JSON Schema)

{ "$schema": "http://json-schema.org/draft-07/schema#", "additionalProperties": false, "properties": { "maxCharacters": { "description": "Maximum characters to extract (default: 3000)", "type": "number" }, "url": { "description": "URL to crawl and extract content from", "type": "string" } }, "required": [ "url" ], "type": "object" }

Implementation Reference

  • MCP tool handler for 'crawling_exa': validates args with schema, calls ExaClient.crawl(), returns formatted results
    case 'crawling_exa': { const params = crawlingSchema.parse(args); const results = await client.crawl(params); return { content: [{ type: "text", text: formatCrawlResults(results) }] }; }
  • Input schema validation for crawling_exa tool using Zod
    const crawlingSchema = z.object({ urls: z.array(z.string()).describe("List of URLs to crawl and extract content from"), include_text: z.boolean().optional().default(true).describe("Include extracted text content"), include_highlights: z.boolean().optional().default(false).describe("Include key highlights from the content"), include_summary: z.boolean().optional().default(false).describe("Include AI-generated summary"), summary_query: z.string().optional().describe("Custom query for generating summaries") });
  • Core crawling implementation in ExaClient: calls Exa JS SDK getContents with URL list and extraction options
    * Crawl and extract content from URLs */ async crawl(params: { urls: string[]; include_text?: boolean; include_highlights?: boolean; include_summary?: boolean; summary_query?: string; }) { try { const contents = await this.client.getContents( params.urls, { text: params.include_text ? { maxCharacters: 5000, includeHtmlTags: false } : undefined, highlights: params.include_highlights ? { numSentences: 5, highlightsPerUrl: 5, query: params.summary_query || "key information" } : undefined, summary: params.include_summary ? { query: params.summary_query || "main points and key information" } : undefined } ); return contents.results; } catch (error) { throw new ExaError( error instanceof Error ? error.message : 'Failed to crawl URLs', 'CRAWL_ERROR' ); }
  • src/server.ts:30-37 (registration)
    Registration of crawling_exa in default enabledTools list for the ExaServer
    this.enabledTools = enabledTools || [ 'web_search_exa', 'company_research_exa', 'crawling_exa', 'linkedin_search_exa', 'deep_researcher_start', 'deep_researcher_check' ];
  • src/server.ts:153-159 (registration)
    Registration of crawling_exa in enabled tools for standalone server factory
    'web_search_exa', 'company_research_exa', 'crawling_exa', 'linkedin_search_exa', 'deep_researcher_start', 'deep_researcher_check' ];

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/joerup/exa-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server