read_webpage
Extract webpage content in formats optimized for LLM processing, including text, markdown, HTML, and screenshots with configurable options.
Instructions
Extract content from a webpage in a format optimized for LLMs
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| format | No | ||
| with_links | No | ||
| with_images | No | ||
| with_generated_alt | No | ||
| no_cache | No |
Input Schema (JSON Schema)
{
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
"format": {
"enum": [
"Default",
"Markdown",
"HTML",
"Text",
"Screenshot",
"Pageshot"
],
"type": "string"
},
"no_cache": {
"type": "boolean"
},
"url": {
"type": "string"
},
"with_generated_alt": {
"type": "boolean"
},
"with_images": {
"type": "boolean"
},
"with_links": {
"type": "boolean"
}
},
"required": [
"url"
],
"type": "object"
}
Implementation Reference
- index.ts:37-63 (handler)The handler function that implements the read_webpage tool. It makes a POST request to Jina AI's reader API (https://r.jina.ai/) with the provided URL and optional parameters, handles headers for additional features, and parses the response using ReaderResponseSchema.async function readWebPage(params: z.infer<typeof ReadWebPageSchema>) { const headers: Record<string, string> = { 'Authorization': `Bearer ${JINA_API_KEY}`, 'Content-Type': 'application/json', 'Accept': 'application/json' }; if (params.with_links) headers['X-With-Links-Summary'] = 'true'; if (params.with_images) headers['X-With-Images-Summary'] = 'true'; if (params.with_generated_alt) headers['X-With-Generated-Alt'] = 'true'; if (params.no_cache) headers['X-No-Cache'] = 'true'; const response = await fetch('https://r.jina.ai/', { method: 'POST', headers, body: JSON.stringify({ url: params.url, options: params.format || 'Default' }) }); if (!response.ok) { throw new Error(`Jina AI API error: ${response.statusText}`); } return ReaderResponseSchema.parse(await response.json()); }
- schemas.ts:35-42 (schema)Zod schema defining the input parameters for the read_webpage tool: required URL and optional flags for format, links, images, alt text generation, and cache.export const ReadWebPageSchema = z.object({ url: z.string(), format: z.enum(['Default', 'Markdown', 'HTML', 'Text', 'Screenshot', 'Pageshot']).optional(), with_links: z.boolean().optional(), with_images: z.boolean().optional(), with_generated_alt: z.boolean().optional(), no_cache: z.boolean().optional() });
- index.ts:113-117 (registration)Registration of the read_webpage tool in the MCP server's list tools handler. Specifies the tool name, description, and converts the Zod schema to JSON schema for the protocol.{ name: "read_webpage", description: "Extract content from a webpage in a format optimized for LLMs", inputSchema: zodToJsonSchema(ReadWebPageSchema) },