Skip to main content
Glama
acchuang

Jina AI Remote MCP Server

by acchuang

read_url

Extract web page content and convert it to clean markdown format for reading articles, documentation, or analyzing text from websites.

Instructions

Extract and convert web page content to clean, readable markdown format. Perfect for reading articles, documentation, blog posts, or any web content. Use this when you need to analyze text content from websites, bypass paywalls, or get structured data.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe complete URL of the webpage or PDF file to read and convert (e.g., 'https://example.com/article'). Can be a single URL string or an array of URLs for parallel reading.
withAllLinksNoSet to true to extract and return all hyperlinks found on the page as structured data
withAllImagesNoSet to true to extract and return all images found on the page as structured data

Implementation Reference

  • The function that executes the read_url tool logic: normalizes the input URL, constructs headers based on options, fetches content from https://r.jina.ai/, handles errors, extracts structured data (title, links, images if requested), and returns markdown content and YAML-structured metadata.
    async ({ url, withAllLinks, withAllImages }: { url: string; withAllLinks?: boolean; withAllImages?: boolean }) => {
    	try {
    		const props = getProps();
    		// Normalize the URL first
    		const normalizedUrl = normalizeUrl(url);
    		if (!normalizedUrl) {
    			return {
    				content: [
    					{
    						type: "text" as const,
    						text: "Error: Invalid or unsupported URL",
    					},
    				],
    				isError: true,
    			};
    		}
    
    		const headers: Record<string, string> = {
    			'Accept': 'application/json',
    			'Content-Type': 'application/json',
    			'X-Md-Link-Style': 'discarded',
    		};
    
    		// Add Authorization header if bearer token is available
    		if (props.bearerToken) {
    			headers['Authorization'] = `Bearer ${props.bearerToken}`;
    		}
    
    		if (withAllLinks) {
    			headers['X-With-Links-Summary'] = 'all';
    		}
    
    		if (withAllImages) {
    			headers['X-With-Images-Summary'] = 'true';
    		} else {
    			headers['X-Retain-Images'] = 'none';
    		}
    
    		const response = await fetch('https://r.jina.ai/', {
    			method: 'POST',
    			headers,
    			body: JSON.stringify({ url: normalizedUrl }),
    		});
    
    		if (!response.ok) {
    			return handleApiError(response, "URL conversion");
    		}
    
    		const data = await response.json() as any;
    
    		if (!data.data) {
    			return {
    				content: [
    					{
    						type: "text" as const,
    						text: "Error: Invalid response data from r.jina.ai",
    					},
    				],
    				isError: true,
    			};
    		}
    
    		const responseContent = [];
    
    
    		// Add structured data as JSON if requested via parameters
    		const structuredData: any = {};
    
    		if (data.data.url) {
    			structuredData.url = data.data.url;
    		}
    
    		if (data.data.title) {
    			structuredData.title = data.data.title;
    		}
    
    		if (withAllLinks && data.data.links) {
    			// Transform links from [anchorText, url] arrays to {anchorText, url} objects
    			structuredData.links = data.data.links.map((link: [string, string]) => ({
    				anchorText: link[0],
    				url: link[1]
    			}));
    		}
    
    		if (withAllImages && data.data.images) {
    			structuredData.images = data.data.images;
    		}
    
    		// Add structured data if any exists
    		if (Object.keys(structuredData).length > 0) {
    			responseContent.push({
    				type: "text" as const,
    				text: yamlStringify(structuredData),
    			});
    		}
    
    		// Add main content
    		if (data.data.content) {
    			responseContent.push({
    				type: "text" as const,
    				text: String(data.data.content),
    			});
    		}
    
    		return {
    			content: responseContent.length > 0 ? responseContent : [
    				{
    					type: "text" as const,
    					text: "No content available",
    				},
    			],
    		};
    	} catch (error) {
    		return {
    			content: [
    				{
    					type: "text" as const,
    					text: `Error: ${error instanceof Error ? error.message : String(error)}`,
    				},
    			],
    			isError: true,
    		};
    	}
  • Input schema using Zod for validating tool parameters: url (required string URL), optional booleans for including all links and images.
    {
    	url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert (e.g., 'https://example.com/article')"),
    	withAllLinks: z.boolean().optional().describe("Set to true to extract and return all hyperlinks found on the page as structured data"),
    	withAllImages: z.boolean().optional().describe("Set to true to extract and return all images found on the page as structured data")
    },
  • src/index.ts:19-22 (registration)
    In the MCP agent's init method, calls registerJinaTools to register all tools, including read_url, on the MCP server.
    async init() {
    	// Register all Jina AI tools
    	registerJinaTools(this.server, () => this.props);
    }
  • Supporting utility normalizeUrl cleans and standardizes input URLs by removing tracking params, utm, sessions, anchors, www prefix, etc., before sending to the API.
    export function normalizeUrl(urlString: string, options = {
    	removeAnchors: true,
    	removeSessionIDs: true,
    	removeUTMParams: true,
    	removeTrackingParams: true,
    	removeXAnalytics: true
    }) {
    	try {
    		urlString = urlString.replace(/\s+/g, '').trim();
    
    		if (!urlString?.trim()) {
    			throw new Error('Empty URL');
    		}
    
    		// Handle x.com and twitter.com URLs with /analytics
    		if (options.removeXAnalytics) {
    			const xComPattern = /^(https?:\/\/(www\.)?(x\.com|twitter\.com)\/([^/]+)\/status\/(\d+))\/analytics(\/)?(\?.*)?(#.*)?$/i;
    			const xMatch = urlString.match(xComPattern);
    			if (xMatch) {
    				let cleanUrl = xMatch[1];
    				if (xMatch[7]) cleanUrl += xMatch[7];
    				if (xMatch[8]) cleanUrl += xMatch[8];
    				urlString = cleanUrl;
    			}
    		}
    
    		const url = new URL(urlString);
    		if (url.protocol !== 'http:' && url.protocol !== 'https:') {
    			throw new Error('Unsupported protocol');
    		}
    
    		url.hostname = url.hostname.toLowerCase();
    		if (url.hostname.startsWith('www.')) {
    			url.hostname = url.hostname.slice(4);
    		}
    
    		if ((url.protocol === 'http:' && url.port === '80') ||
    			(url.protocol === 'https:' && url.port === '443')) {
    			url.port = '';
    		}
    
    		// Query parameter filtering
    		const searchParams = new URLSearchParams(url.search);
    		const filteredParams = Array.from(searchParams.entries())
    			.filter(([key]) => {
    				if (key === '') return false;
    				if (options.removeSessionIDs && /^(s|session|sid|sessionid|phpsessid|jsessionid|aspsessionid|asp\.net_sessionid)$/i.test(key)) {
    					return false;
    				}
    				if (options.removeUTMParams && /^utm_/i.test(key)) {
    					return false;
    				}
    				if (options.removeTrackingParams && /^(ref|referrer|fbclid|gclid|cid|mcid|source|medium|campaign|term|content|sc_rid|mc_[a-z]+)$/i.test(key)) {
    					return false;
    				}
    				return true;
    			})
    			.sort(([keyA], [keyB]) => keyA.localeCompare(keyB));
    
    		url.search = new URLSearchParams(filteredParams).toString();
    
    		if (options.removeAnchors) {
    			url.hash = '';
    		}
    
    		return url.toString();
    	} catch (error) {
    		return undefined;
    	}
    }
  • Supporting utility handleApiError returns standardized MCP error responses for common HTTP status codes (401,402,429) from the Jina API.
    export function handleApiError(response: Response, context: string = "API request") {
    	if (response.status === 401) {
    		return {
    			content: [
    				{
    					type: "text" as const,
    					text: "Authentication failed. Please set your API key in the Jina AI MCP settings. You can get a free API key by visiting https://jina.ai and signing up for an account.",
    				},
    			],
    			isError: true,
    		};
    	}
    	if (response.status === 402) {
    		return {
    			content: [
    				{
    					type: "text" as const,
    					text: "This key is out of quota. Please top up this key at https://jina.ai",
    				},
    			],
    			isError: true,
    		};
    	}
    	
    	if (response.status === 429) {
    		return {
    			content: [
    				{
    					type: "text" as const,
    					text: "Rate limit exceeded. Please upgrade your API key to get higher rate limits. Visit https://jina.ai to manage your subscription and increase your usage limits.",
    				},
    			],
    			isError: true,
    		};
    	}
    	
    	// Default error message for other HTTP errors
    	return {
    		content: [
    			{
    				type: "text" as const,
    				text: `Error: ${context} failed - ${response.status} ${response.statusText}`,
    			},
    		],
    		isError: true,
    	};
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'bypass paywalls' which implies potential access to restricted content, but doesn't address rate limits, authentication needs, error handling, or what happens with malformed URLs. For a tool that interacts with external websites, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with three sentences that each serve distinct purposes: stating the core function, providing use cases, and giving application scenarios. It's front-loaded with the main purpose. Minor improvement could come from more specific differentiation from sibling tools.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (web scraping/conversion), no annotations, and no output schema, the description is somewhat incomplete. It covers the 'what' and 'why' well but lacks details about return format, error conditions, performance characteristics, or how the markdown conversion handles different page structures. The absence of output schema means the description should ideally address what the tool returns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing complete parameter documentation. The description doesn't add any parameter-specific information beyond what's in the schema, so it meets the baseline score of 3. It doesn't explain how 'withAllLinks' or 'withAllImages' affect the output format or provide examples of the structured data returned.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('extract and convert') and resources ('web page content to clean, readable markdown format'). It distinguishes itself from siblings like capture_screenshot_url (visual capture) and extract_pdf (PDF-specific extraction) by focusing on HTML-to-markdown conversion for general web content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context with examples ('articles, documentation, blog posts') and scenarios ('bypass paywalls, get structured data'). However, it doesn't explicitly state when NOT to use this tool or name specific alternatives among siblings, such as when to use parallel_read_url for multiple URLs versus this tool's array capability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/acchuang/jina-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server