crawling_exa
Extract full text content, metadata, and structured information from specific URLs using advanced web crawling capabilities. Ideal for detailed content retrieval from known web pages.
Instructions
Extract and crawl content from specific URLs using Exa AI - retrieves full text content, metadata, and structured information from web pages. Ideal for extracting detailed content from known URLs.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| maxCharacters | No | Maximum characters to extract (default: 3000) | |
| url | Yes | URL to crawl and extract content from |
Input Schema (JSON Schema)
{
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
"maxCharacters": {
"description": "Maximum characters to extract (default: 3000)",
"type": "number"
},
"url": {
"description": "URL to crawl and extract content from",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}
Implementation Reference
- src/tools/index.ts:228-237 (handler)MCP tool handler for 'crawling_exa': validates args with schema, calls ExaClient.crawl(), returns formatted resultscase 'crawling_exa': { const params = crawlingSchema.parse(args); const results = await client.crawl(params); return { content: [{ type: "text", text: formatCrawlResults(results) }] }; }
- src/tools/index.ts:46-52 (schema)Input schema validation for crawling_exa tool using Zodconst crawlingSchema = z.object({ urls: z.array(z.string()).describe("List of URLs to crawl and extract content from"), include_text: z.boolean().optional().default(true).describe("Include extracted text content"), include_highlights: z.boolean().optional().default(false).describe("Include key highlights from the content"), include_summary: z.boolean().optional().default(false).describe("Include AI-generated summary"), summary_query: z.string().optional().describe("Custom query for generating summaries") });
- src/client.ts:144-169 (helper)Core crawling implementation in ExaClient: calls Exa JS SDK getContents with URL list and extraction options* Crawl and extract content from URLs */ async crawl(params: { urls: string[]; include_text?: boolean; include_highlights?: boolean; include_summary?: boolean; summary_query?: string; }) { try { const contents = await this.client.getContents( params.urls, { text: params.include_text ? { maxCharacters: 5000, includeHtmlTags: false } : undefined, highlights: params.include_highlights ? { numSentences: 5, highlightsPerUrl: 5, query: params.summary_query || "key information" } : undefined, summary: params.include_summary ? { query: params.summary_query || "main points and key information" } : undefined } ); return contents.results; } catch (error) { throw new ExaError( error instanceof Error ? error.message : 'Failed to crawl URLs', 'CRAWL_ERROR' ); }
- src/server.ts:30-37 (registration)Registration of crawling_exa in default enabledTools list for the ExaServerthis.enabledTools = enabledTools || [ 'web_search_exa', 'company_research_exa', 'crawling_exa', 'linkedin_search_exa', 'deep_researcher_start', 'deep_researcher_check' ];
- src/server.ts:153-159 (registration)Registration of crawling_exa in enabled tools for standalone server factory'web_search_exa', 'company_research_exa', 'crawling_exa', 'linkedin_search_exa', 'deep_researcher_start', 'deep_researcher_check' ];