Skip to main content
Glama
openags

Paper Search MCP

by openags

read_arxiv_paper

Extract text content from arXiv paper PDFs using the paper ID to access and process academic research documents for analysis.

Instructions

Read and extract text content from an arXiv paper PDF.

Args: paper_id: arXiv paper ID (e.g., '2106.12345'). save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: The extracted text content of the paper.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paper_idYes
save_pathNo./downloads

Implementation Reference

  • The handler for the read_arxiv_paper tool in the MCP server, which delegates to the ArxivSearcher.
    async def read_arxiv_paper(paper_id: str, save_path: str = "./downloads") -> str:
        """Read and extract text content from an arXiv paper PDF.
    
        Args:
            paper_id: arXiv paper ID (e.g., '2106.12345').
            save_path: Directory where the PDF is/will be saved (default: './downloads').
        Returns:
            str: The extracted text content of the paper.
        """
        try:
            return arxiv_searcher.read_paper(paper_id, save_path)
        except Exception as e:
            print(f"Error reading paper {paper_id}: {e}")
            return ""
  • The actual implementation logic that reads a PDF file and extracts its text content.
    def read_paper(self, paper_id: str, save_path: str = "./downloads") -> str:
        """Read a paper and convert it to text format.
        
        Args:
            paper_id: arXiv paper ID
            save_path: Directory where the PDF is/will be saved
            
        Returns:
            str: The extracted text content of the paper
        """
        # First ensure we have the PDF
        pdf_path = f"{save_path}/{paper_id}.pdf"
        if not os.path.exists(pdf_path):
            pdf_path = self.download_pdf(paper_id, save_path)
        
        # Read the PDF
        try:
            reader = PdfReader(pdf_path)
            text = ""
            
            # Extract text from each page
            for page in reader.pages:
                text += page.extract_text() + "\n"
            
            return text.strip()
        except Exception as e:
            print(f"Error reading PDF for paper {paper_id}: {e}")
            return ""
  • The MCP tool registration decorator for the read_arxiv_paper function.
    @mcp.tool()
    async def read_arxiv_paper(paper_id: str, save_path: str = "./downloads") -> str:

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/openags/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server