fetch_fulltext
Retrieve full-text academic paper content from provided URLs, returning plain text output even when access is restricted or paywalled.
Instructions
Retrieves the contents of a paper or work from its preferred full-text URL and returns the response body as plain text.
Note: In some cases, the target content may be paywalled, require authentication, or otherwise restrict access. In such situations, the returned output may consist of partial content, metadata, or an access notice rather than the complete text.
Args: preferred_fulltext_url: Preferred full-text URL of the paper or work.
Returns: Plaintext representation of the retrieved content. This may be the complete text, or a limited excerpt if access to the full resource is restricted.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| preferred_fulltext_url | Yes |
Implementation Reference
- src/server.py:456-510 (handler)The core handler function for the 'fetch_fulltext' tool. Decorated with @mcp.tool for automatic registration. Validates the URL, strips Jina prefix if present, fetches the content via Jina.ai proxy for readable text extraction, and handles errors appropriately.@mcp.tool(annotations={"readOnlyHint": True, "idempotentHint": True}) async def fetch_fulltext(preferred_fulltext_url: str, ctx: Context) -> str: """ Retrieves the contents of a paper or work from its preferred full-text URL and returns the response body as plain text. Note: In some cases, the target content may be paywalled, require authentication, or otherwise restrict access. In such situations, the returned output may consist of partial content, metadata, or an access notice rather than the complete text. Args: preferred_fulltext_url: Preferred full-text URL of the paper or work. Returns: Plaintext representation of the retrieved content. This may be the complete text, or a limited excerpt if access to the full resource is restricted. """ # Strip Jina prefix if already present jina_prefix = "https://r.jina.ai/" if preferred_fulltext_url.startswith(jina_prefix): preferred_fulltext_url = preferred_fulltext_url[len(jina_prefix):] logger.debug(f"Removed Jina prefix, normalized URL: {preferred_fulltext_url}") # Validates to prevent malicious/malformed urls if not validate_url_with_ssrf_guard(preferred_fulltext_url): error_message = f"Invalid or Disallowed URL: {preferred_fulltext_url}" logger.error(error_message) await ctx.error(error_message) raise ResourceError(error_message) async with RequestAPI(jina_prefix[:-1]) as api: logger.info(f"Fetching page: url={preferred_fulltext_url}") await ctx.info(f"Fetching full-text from the url...") try: # Fetch contents of the page (wrapped with jina for easy LLM reading) result = await api.aget(f"/{preferred_fulltext_url}") if result is None: error_message = "Response is empty content. Try again later." logger.info(error_message) await ctx.error(error_message) raise ToolError(error_message) return result except httpx.HTTPStatusError as e: error_message = f"Request failed with status: {e.response.status_code}" logger.error(error_message) await ctx.error(error_message) raise ResourceError(error_message) except httpx.RequestError as e: error_message = f"Network error: {str(e)}" logger.error(error_message) await ctx.error(error_message) raise ResourceError(error_message)
- src/server.py:456-456 (registration)The @mcp.tool decorator registers the fetch_fulltext function as an MCP tool with read-only and idempotent hints.@mcp.tool(annotations={"readOnlyHint": True, "idempotentHint": True})