url_context_extract
Extract structured content from web URLs using Gemini AI, returning JSON with page text, summaries, and metadata for analysis or processing.
Instructions
Extract content from URLs using Gemini AI and return structured JSON with pages, answer, and metadata
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| maxCharsPerPage | No | Maximum characters per page (optional, defaults to 8000) | |
| model | No | Gemini model name to use (optional, defaults to gemini-2.0-flash-exp) | |
| query | No | Optional query to guide content extraction and summary | |
| urls | Yes | Array of URLs to extract content from |
Implementation Reference
- Core implementation of the tool logic: processes URLs, invokes GenAI adapter to extract context, constructs Page objects and ExtractContentResult.async execute( urls: Url[], query?: string, model?: ModelName, maxCharsPerPage: number = 8000 ): Promise<ExtractContentResult> { const modelName = model || ModelName.create(); const response = await this.genAI.generateUrlContextJson({ urls: urls.map(url => url.toString()), query, model: modelName.toString(), maxCharsPerPage }); const pages = response.pages.map(pageData => Page.create( Url.create(pageData.url), pageData.title, pageData.text, pageData.images, maxCharsPerPage ) ); return ExtractContentResult.create( pages, response.answer, response.url_context_metadata ); }
- src/index.ts:101-153 (handler)MCP CallToolRequest handler specifically for 'url_context_extract': validates params, creates domain URLs, invokes ExtractContentUseCase, serializes result to JSON.if (request.params.name === 'url_context_extract') { try { const { urls, query, model, maxCharsPerPage } = request.params.arguments as { urls: string[]; query?: string; model?: string; maxCharsPerPage?: number; }; if (!Array.isArray(urls) || urls.length === 0) { throw new McpError( ErrorCode.InvalidParams, 'urls must be a non-empty array of strings' ); } // Validate and create domain objects const urlObjects = urls.map(url => { try { return Url.create(url); } catch (error) { throw new McpError( ErrorCode.InvalidParams, `Invalid URL: ${url}. ${(error as Error).message}` ); } }); const modelName = model ? ModelName.create(model) : ModelName.create(); const maxChars = maxCharsPerPage || 8000; // Execute use case const result = await this.useCase.execute(urlObjects, query, modelName, maxChars); return { content: [ { type: 'text', text: JSON.stringify(result.toJSON(), null, 2), }, ], }; } catch (error) { if (error instanceof McpError) { throw error; } throw new McpError( ErrorCode.InternalError, `Failed to extract URL content: ${(error as Error).message}` ); } }
- src/index.ts:49-71 (schema)Input schema definition for the 'url_context_extract' tool.inputSchema: { type: 'object', properties: { urls: { type: 'array', items: { type: 'string' }, description: 'Array of URLs to extract content from', }, query: { type: 'string', description: 'Optional query to guide content extraction and summary', }, model: { type: 'string', description: 'Gemini model name to use (optional, defaults to gemini-2.0-flash-exp)', }, maxCharsPerPage: { type: 'number', description: 'Maximum characters per page (optional, defaults to 8000)', }, }, required: ['urls'], },
- src/index.ts:45-72 (registration)Tool registration in the ListToolsRequest handler, defining name, description, and input schema.{ name: 'url_context_extract', description: 'Extract content from URLs using Gemini AI and return structured JSON with pages, answer, and metadata', inputSchema: { type: 'object', properties: { urls: { type: 'array', items: { type: 'string' }, description: 'Array of URLs to extract content from', }, query: { type: 'string', description: 'Optional query to guide content extraction and summary', }, model: { type: 'string', description: 'Gemini model name to use (optional, defaults to gemini-2.0-flash-exp)', }, maxCharsPerPage: { type: 'number', description: 'Maximum characters per page (optional, defaults to 8000)', }, }, required: ['urls'], }, },