read_url
Extract and convert web content from URLs into structured, LLM-readable text for analysis and processing.
Instructions
Convert any URL to LLM-friendly text using Jina.ai Reader
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to process | |
| no_cache | No | Bypass cache for fresh results | |
| format | No | Response format (json or stream) | json |
| timeout | No | Maximum time in seconds to wait for webpage load | |
| target_selector | No | CSS selector to focus on specific elements | |
| wait_for_selector | No | CSS selector to wait for specific elements | |
| remove_selector | No | CSS selector to exclude specific elements | |
| with_links_summary | No | Gather all links at the end of response | |
| with_images_summary | No | Gather all images at the end of response | |
| with_generated_alt | No | Add alt text to images lacking captions | |
| with_iframe | No | Include iframe content in response |
Implementation Reference
- src/index.ts:133-238 (handler)CallToolRequest handler that implements the core logic for the 'read_url' tool: validates input, constructs headers with optional parameters, fetches from Jina.ai Reader API, and returns the processed text content.this.server.setRequestHandler( CallToolRequestSchema, async (request) => { if (request.params.name !== 'read_url') { throw new McpError( ErrorCode.MethodNotFound, `Unknown tool: ${request.params.name}`, ); } const args = request.params.arguments as Record< string, unknown >; if ( !args || typeof args.url !== 'string' || !is_valid_url(args.url) ) { throw new McpError( ErrorCode.InvalidParams, 'Invalid or missing URL parameter', ); } try { const headers: Record<string, string> = { Accept: typeof args.format === 'string' && args.format === 'stream' ? 'text/event-stream' : 'application/json', 'Content-Type': 'application/json', Authorization: `Bearer ${JINAAI_API_KEY}`, }; // Optional headers from documentation if (typeof args.no_cache === 'boolean' && args.no_cache) { headers['X-No-Cache'] = 'true'; } if (typeof args.timeout === 'number') { headers['X-Timeout'] = args.timeout.toString(); } if (typeof args.target_selector === 'string') { headers['X-Target-Selector'] = args.target_selector; } if (typeof args.wait_for_selector === 'string') { headers['X-Wait-For-Selector'] = args.wait_for_selector; } if (typeof args.remove_selector === 'string') { headers['X-Remove-Selector'] = args.remove_selector; } if ( typeof args.with_links_summary === 'boolean' && args.with_links_summary ) { headers['X-With-Links-Summary'] = 'true'; } if ( typeof args.with_images_summary === 'boolean' && args.with_images_summary ) { headers['X-With-Images-Summary'] = 'true'; } if ( typeof args.with_generated_alt === 'boolean' && args.with_generated_alt ) { headers['X-With-Generated-Alt'] = 'true'; } if ( typeof args.with_iframe === 'boolean' && args.with_iframe ) { headers['X-With-Iframe'] = 'true'; } const response = await fetch(this.base_url + args.url, { headers, }); if (!response.ok) { throw new Error(`HTTP error! status: ${response.status}`); } const result = await response.text(); return { content: [ { type: 'text', text: result, }, ], }; } catch (error) { const message = error instanceof Error ? error.message : String(error); throw new McpError( ErrorCode.InternalError, `Failed to process URL: ${message}`, ); } }, );
- src/index.ts:68-127 (schema)Input schema defining parameters for the 'read_url' tool, including required 'url' and various optional Jina.ai Reader options.inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to process', }, no_cache: { type: 'boolean', description: 'Bypass cache for fresh results', default: false, }, format: { type: 'string', description: 'Response format (json or stream)', enum: ['json', 'stream'], default: 'json', }, timeout: { type: 'number', description: 'Maximum time in seconds to wait for webpage load', }, target_selector: { type: 'string', description: 'CSS selector to focus on specific elements', }, wait_for_selector: { type: 'string', description: 'CSS selector to wait for specific elements', }, remove_selector: { type: 'string', description: 'CSS selector to exclude specific elements', }, with_links_summary: { type: 'boolean', description: 'Gather all links at the end of response', }, with_images_summary: { type: 'boolean', description: 'Gather all images at the end of response', }, with_generated_alt: { type: 'boolean', description: 'Add alt text to images lacking captions', }, with_iframe: { type: 'boolean', description: 'Include iframe content in response', }, }, required: ['url'], },
- src/index.ts:61-131 (registration)Registers the 'read_url' tool in the ListToolsRequest handler, providing name, description, and schema.ListToolsRequestSchema, async () => ({ tools: [ { name: 'read_url', description: 'Convert any URL to LLM-friendly text using Jina.ai Reader', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to process', }, no_cache: { type: 'boolean', description: 'Bypass cache for fresh results', default: false, }, format: { type: 'string', description: 'Response format (json or stream)', enum: ['json', 'stream'], default: 'json', }, timeout: { type: 'number', description: 'Maximum time in seconds to wait for webpage load', }, target_selector: { type: 'string', description: 'CSS selector to focus on specific elements', }, wait_for_selector: { type: 'string', description: 'CSS selector to wait for specific elements', }, remove_selector: { type: 'string', description: 'CSS selector to exclude specific elements', }, with_links_summary: { type: 'boolean', description: 'Gather all links at the end of response', }, with_images_summary: { type: 'boolean', description: 'Gather all images at the end of response', }, with_generated_alt: { type: 'boolean', description: 'Add alt text to images lacking captions', }, with_iframe: { type: 'boolean', description: 'Include iframe content in response', }, }, required: ['url'], }, }, ], }), );
- src/index.ts:27-34 (helper)Utility function to validate if the provided URL string is valid, used in the read_url handler for input validation.const is_valid_url = (url: string): boolean => { try { new URL(url); return true; } catch { return false; } };