Skip to main content
Glama
acchuang

Jina AI Remote MCP Server

by acchuang

read_url

Extract web page content and convert it to clean markdown format for reading articles, documentation, or analyzing text from websites.

Instructions

Extract and convert web page content to clean, readable markdown format. Perfect for reading articles, documentation, blog posts, or any web content. Use this when you need to analyze text content from websites, bypass paywalls, or get structured data.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe complete URL of the webpage or PDF file to read and convert (e.g., 'https://example.com/article'). Can be a single URL string or an array of URLs for parallel reading.
withAllLinksNoSet to true to extract and return all hyperlinks found on the page as structured data
withAllImagesNoSet to true to extract and return all images found on the page as structured data

Implementation Reference

  • The function that executes the read_url tool logic: normalizes the input URL, constructs headers based on options, fetches content from https://r.jina.ai/, handles errors, extracts structured data (title, links, images if requested), and returns markdown content and YAML-structured metadata.
    async ({ url, withAllLinks, withAllImages }: { url: string; withAllLinks?: boolean; withAllImages?: boolean }) => { try { const props = getProps(); // Normalize the URL first const normalizedUrl = normalizeUrl(url); if (!normalizedUrl) { return { content: [ { type: "text" as const, text: "Error: Invalid or unsupported URL", }, ], isError: true, }; } const headers: Record<string, string> = { 'Accept': 'application/json', 'Content-Type': 'application/json', 'X-Md-Link-Style': 'discarded', }; // Add Authorization header if bearer token is available if (props.bearerToken) { headers['Authorization'] = `Bearer ${props.bearerToken}`; } if (withAllLinks) { headers['X-With-Links-Summary'] = 'all'; } if (withAllImages) { headers['X-With-Images-Summary'] = 'true'; } else { headers['X-Retain-Images'] = 'none'; } const response = await fetch('https://r.jina.ai/', { method: 'POST', headers, body: JSON.stringify({ url: normalizedUrl }), }); if (!response.ok) { return handleApiError(response, "URL conversion"); } const data = await response.json() as any; if (!data.data) { return { content: [ { type: "text" as const, text: "Error: Invalid response data from r.jina.ai", }, ], isError: true, }; } const responseContent = []; // Add structured data as JSON if requested via parameters const structuredData: any = {}; if (data.data.url) { structuredData.url = data.data.url; } if (data.data.title) { structuredData.title = data.data.title; } if (withAllLinks && data.data.links) { // Transform links from [anchorText, url] arrays to {anchorText, url} objects structuredData.links = data.data.links.map((link: [string, string]) => ({ anchorText: link[0], url: link[1] })); } if (withAllImages && data.data.images) { structuredData.images = data.data.images; } // Add structured data if any exists if (Object.keys(structuredData).length > 0) { responseContent.push({ type: "text" as const, text: yamlStringify(structuredData), }); } // Add main content if (data.data.content) { responseContent.push({ type: "text" as const, text: String(data.data.content), }); } return { content: responseContent.length > 0 ? responseContent : [ { type: "text" as const, text: "No content available", }, ], }; } catch (error) { return { content: [ { type: "text" as const, text: `Error: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; }
  • Input schema using Zod for validating tool parameters: url (required string URL), optional booleans for including all links and images.
    { url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert (e.g., 'https://example.com/article')"), withAllLinks: z.boolean().optional().describe("Set to true to extract and return all hyperlinks found on the page as structured data"), withAllImages: z.boolean().optional().describe("Set to true to extract and return all images found on the page as structured data") },
  • src/index.ts:19-22 (registration)
    In the MCP agent's init method, calls registerJinaTools to register all tools, including read_url, on the MCP server.
    async init() { // Register all Jina AI tools registerJinaTools(this.server, () => this.props); }
  • Supporting utility normalizeUrl cleans and standardizes input URLs by removing tracking params, utm, sessions, anchors, www prefix, etc., before sending to the API.
    export function normalizeUrl(urlString: string, options = { removeAnchors: true, removeSessionIDs: true, removeUTMParams: true, removeTrackingParams: true, removeXAnalytics: true }) { try { urlString = urlString.replace(/\s+/g, '').trim(); if (!urlString?.trim()) { throw new Error('Empty URL'); } // Handle x.com and twitter.com URLs with /analytics if (options.removeXAnalytics) { const xComPattern = /^(https?:\/\/(www\.)?(x\.com|twitter\.com)\/([^/]+)\/status\/(\d+))\/analytics(\/)?(\?.*)?(#.*)?$/i; const xMatch = urlString.match(xComPattern); if (xMatch) { let cleanUrl = xMatch[1]; if (xMatch[7]) cleanUrl += xMatch[7]; if (xMatch[8]) cleanUrl += xMatch[8]; urlString = cleanUrl; } } const url = new URL(urlString); if (url.protocol !== 'http:' && url.protocol !== 'https:') { throw new Error('Unsupported protocol'); } url.hostname = url.hostname.toLowerCase(); if (url.hostname.startsWith('www.')) { url.hostname = url.hostname.slice(4); } if ((url.protocol === 'http:' && url.port === '80') || (url.protocol === 'https:' && url.port === '443')) { url.port = ''; } // Query parameter filtering const searchParams = new URLSearchParams(url.search); const filteredParams = Array.from(searchParams.entries()) .filter(([key]) => { if (key === '') return false; if (options.removeSessionIDs && /^(s|session|sid|sessionid|phpsessid|jsessionid|aspsessionid|asp\.net_sessionid)$/i.test(key)) { return false; } if (options.removeUTMParams && /^utm_/i.test(key)) { return false; } if (options.removeTrackingParams && /^(ref|referrer|fbclid|gclid|cid|mcid|source|medium|campaign|term|content|sc_rid|mc_[a-z]+)$/i.test(key)) { return false; } return true; }) .sort(([keyA], [keyB]) => keyA.localeCompare(keyB)); url.search = new URLSearchParams(filteredParams).toString(); if (options.removeAnchors) { url.hash = ''; } return url.toString(); } catch (error) { return undefined; } }
  • Supporting utility handleApiError returns standardized MCP error responses for common HTTP status codes (401,402,429) from the Jina API.
    export function handleApiError(response: Response, context: string = "API request") { if (response.status === 401) { return { content: [ { type: "text" as const, text: "Authentication failed. Please set your API key in the Jina AI MCP settings. You can get a free API key by visiting https://jina.ai and signing up for an account.", }, ], isError: true, }; } if (response.status === 402) { return { content: [ { type: "text" as const, text: "This key is out of quota. Please top up this key at https://jina.ai", }, ], isError: true, }; } if (response.status === 429) { return { content: [ { type: "text" as const, text: "Rate limit exceeded. Please upgrade your API key to get higher rate limits. Visit https://jina.ai to manage your subscription and increase your usage limits.", }, ], isError: true, }; } // Default error message for other HTTP errors return { content: [ { type: "text" as const, text: `Error: ${context} failed - ${response.status} ${response.statusText}`, }, ], isError: true, }; }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/acchuang/jina-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server