Skip to main content
Glama

load_article_to_context

Load arXiv article text into context for analysis using title or ID, with options for partial extraction and preview validation.

Instructions

Load the article text into context. Supports title or arXiv ID resolution and partial extraction.

Args: title: Article title. arxiv_id: arXiv ID. start_page: 1-based start page (inclusive). end_page: 1-based end page (inclusive). max_pages: hard cap on number of pages to extract. max_chars: hard cap on number of characters to extract. preview: if True, only validate availability and return minimal info.

Returns: Article text or structured error JSON.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
titleNo
arxiv_idNo
start_pageNo
end_pageNo
max_pagesNo
max_charsNo
previewNo

Implementation Reference

  • The primary handler function for the 'load_article_to_context' tool, decorated with @mcp.tool(). It resolves the arXiv article by title or ID, fetches the PDF, extracts text from optional page ranges with character and page limits, handles preview mode, and returns the extracted text or error JSON.
    @mcp.tool() async def load_article_to_context( title: Optional[str] = None, arxiv_id: Optional[str] = None, start_page: Optional[int] = None, end_page: Optional[int] = None, max_pages: Optional[int] = None, max_chars: Optional[int] = None, preview: bool = False, ) -> str: """ Load the article text into context. Supports title or arXiv ID resolution and partial extraction. Args: title: Article title. arxiv_id: arXiv ID. start_page: 1-based start page (inclusive). end_page: 1-based end page (inclusive). max_pages: hard cap on number of pages to extract. max_chars: hard cap on number of characters to extract. preview: if True, only validate availability and return minimal info. Returns: Article text or structured error JSON. """ result = await resolve_article(title=title, arxiv_id=arxiv_id) if isinstance(result, str): return result article_url, resolved_id = result if preview: # Lightweight availability check try: async with httpx.AsyncClient(timeout=DEFAULT_TIMEOUT, limits=HTTP_LIMITS) as client: head = await client.head(article_url, headers={"User-Agent": USER_AGENT}) ok = head.status_code < 400 except Exception: ok = False return json.dumps({"status": "ok" if ok else "error", "reachable": ok, "arxiv_id": resolved_id, "url": article_url}) pdf_bytes = await get_pdf(article_url) if pdf_bytes is None: return _error("FETCH_FAILED", "Unable to retrieve the article from arXiv.org.") try: doc = fitz.open(stream=pdf_bytes, filetype="pdf") except Exception as e: return _error("PDF_OPEN_FAILED", f"Unable to open PDF: {e}") total_pages = doc.page_count # Normalize page bounds (1-based inputs) s = max(1, start_page) if start_page else 1 e = min(end_page, total_pages) if end_page else total_pages if s > e or s < 1: return _error("BAD_RANGE", f"Invalid page range [{s}, {e}] for total_pages={total_pages}") # Apply max_pages cap if max_pages is not None: e = min(e, s + max_pages - 1) parts = [] chars = 0 for p in range(s - 1, e): page_text = doc.load_page(p).get_text() if not page_text: continue if max_chars is not None and chars + len(page_text) > max_chars: remain = max_chars - chars if remain > 0: parts.append(page_text[:remain]) chars += remain break parts.append(page_text) chars += len(page_text) return "".join(parts)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lecigarevolant/arxiv-mcp-server-gpt'

If you have feedback or need assistance with the MCP directory API, please join our Discord server