Skip to main content
Glama

get_paper_content

Extract and retrieve full text content from arXiv paper PDFs using the paper ID to access research documents.

Instructions

Get the full text content of a paper by downloading and extracting text from its PDF

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paper_idYesarXiv paper ID (e.g., 2104.13478 or cs/0001001)

Implementation Reference

  • The main handler function for the 'get_paper_content' tool. It constructs the arXiv PDF URL, downloads the PDF to a temporary directory, extracts the text content using pdf-parse, cleans the text, and returns it in the MCP response format. Handles errors appropriately.
    private async getPaperContent(args: GetPaperContentArgs) { try { // Construct the PDF URL directly // arXiv PDF URLs follow the pattern: https://arxiv.org/pdf/{paper_id}.pdf const pdfUrl = `https://arxiv.org/pdf/${args.paper_id}.pdf`; // Download the PDF const pdfPath = await this.downloadPdf(pdfUrl, args.paper_id); // Extract text from the PDF const textContent = await this.extractTextFromPdf(pdfPath); // Clean up the text (remove excessive whitespace, normalize line breaks) const cleanedText = textContent .replace(/\s+/g, ' ') .replace(/(\r\n|\n|\r)/gm, '\n') .trim(); // Return the extracted text return { content: [ { type: 'text', text: cleanedText, }, ], }; } catch (error) { console.error('Error in getPaperContent:', error); if (axios.isAxiosError(error)) { return { content: [ { type: 'text', text: `Error retrieving paper content: ${error.response?.data || error.message}`, }, ], isError: true, }; } return { content: [ { type: 'text', text: `Error processing paper content: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; } }
  • TypeScript interface defining the input arguments for the get_paper_content tool: requires a paper_id string.
    interface GetPaperContentArgs { paper_id: string; }
  • src/index.ts:201-214 (registration)
    Tool registration in the ListToolsRequest handler. Defines the tool name, description, and JSON input schema matching the TypeScript interface.
    { name: 'get_paper_content', description: 'Get the full text content of a paper by downloading and extracting text from its PDF', inputSchema: { type: 'object', properties: { paper_id: { type: 'string', description: 'arXiv paper ID (e.g., 2104.13478 or cs/0001001)', }, }, required: ['paper_id'], }, },
  • src/index.ts:239-246 (registration)
    Dispatch logic in the CallToolRequest handler. Validates the paper_id argument and invokes the getPaperContent method.
    case 'get_paper_content': if (!request.params.arguments || typeof request.params.arguments.paper_id !== 'string') { throw new McpError( ErrorCode.InvalidParams, 'Missing or invalid paper_id parameter' ); } return await this.getPaperContent(request.params.arguments as unknown as GetPaperContentArgs);
  • Helper function to download the arXiv PDF to a temporary cached location, with User-Agent and timeout.
    private async downloadPdf(url: string, paperId: string): Promise<string> { try { // Ensure temp directory exists await fs.ensureDir(TEMP_PDF_DIR); // Create a unique filename based on the paper ID const sanitizedPaperId = paperId.replace(/\//g, '_'); const pdfPath = path.join(TEMP_PDF_DIR, `${sanitizedPaperId}.pdf`); // Check if we already have this PDF cached if (await fs.pathExists(pdfPath)) { console.error(`Using cached PDF for ${paperId}`); return pdfPath; } console.error(`Downloading PDF for ${paperId} from ${url}`); // Download the PDF with proper headers // Note: Using responseType 'arraybuffer' to handle binary data const response = await axios.get(url, { responseType: 'arraybuffer', headers: { 'User-Agent': 'arXiv-MCP-Server/0.1.0 (https://github.com/your-username/arxiv-mcp-server)', }, // Add a timeout to prevent hanging on large files timeout: 30000, }); // Save the PDF to disk await fs.outputFile(pdfPath, response.data); return pdfPath; } catch (error) { console.error('Error downloading PDF:', error); throw new Error(`Failed to download PDF: ${error instanceof Error ? error.message : String(error)}`); } }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Mnehmos/mnehmos.arxiv.mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server