arxiv:get
Retrieve an arXiv paper by ID. Default returns metadata and abstract; specify section or save_path to fetch full HTML text (papers from ~2024+). Older papers return PDF link.
Instructions
Retrieve an arXiv paper by ID (e.g., "2501.02725"). Default: metadata + abstract. With section or save_path: fetches HTML full text (available for papers from ~2024+). Older papers without HTML return metadata + abstract + PDF link.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| id | Yes | arXiv ID (e.g., "2501.02725", "2501.02725v5") | |
| section | No | Section heading or "lines:100-200". Triggers full text fetch. | |
| save_path | No | Save full text to file. Triggers full text fetch. |
Implementation Reference
- src/providers/arxiv/tools/get.ts:8-45 (handler)The handler function that executes the 'arxiv:get' tool logic. Fetches an arXiv paper by ID, returns metadata + abstract (default), or full HTML text converted to markdown (when section/save_path provided).
export async function handleGet(client: ArxivClient, args: Record<string, unknown>): Promise<ToolResult> { const { id, section, save_path } = args as { id: string; section?: string; save_path?: string }; // Always fetch metadata from Atom API const { entries } = await client.search({ id_list: id, max_results: 1 }); if (!entries.length) return { content: [{ type: 'text', text: `Paper ${id} not found.` }], isError: true }; const entry = entries[0]; const header = [ `# ${entry.title}`, `\n**Autoren:** ${entry.authors.join(', ')}`, `**Datum:** ${entry.published} | **Kategorien:** ${entry.categories.join(', ')}`, entry.doi ? `**DOI:** ${entry.doi}` : '', entry.journalRef ? `**Journal:** ${entry.journalRef}` : '', `**PDF:** ${entry.pdfUrl}`, ].filter(Boolean).join('\n'); // Full text only when section or save_path requested if (!section && !save_path) { return { content: [{ type: 'text', text: `${header}\n\n## Abstract\n\n${entry.summary}` }] }; } const html = await client.getHtml(entry.id); if (!html) { const msg = `${header}\n\n## Abstract\n\n${entry.summary}\n\n---\n*Full HTML text not available for this paper (pre-2024). Use the PDF link above.*`; return { content: [{ type: 'text', text: msg }] }; } const markdown = `${header}\n\n---\n\n${htmlToMarkdown(html)}`; if (save_path) { mkdirSync(dirname(save_path), { recursive: true }); writeFileSync(save_path, markdown, 'utf-8'); return { content: [{ type: 'text', text: `Saved to ${save_path} (${markdown.length} chars)` }] }; } return { content: [{ type: 'text', text: extractSection(markdown, section!) }] }; } - The tool definition with input schema for 'arxiv:get'. Defines the name, description, and Zod validation schema (id required, section and save_path optional).
{ name: 'arxiv:get', description: 'Retrieve an arXiv paper by ID (e.g., "2501.02725"). ' + 'Default: metadata + abstract. With `section` or `save_path`: fetches HTML full text (available for papers from ~2024+). ' + 'Older papers without HTML return metadata + abstract + PDF link.', inputSchema: z.object({ id: z.string().describe('arXiv ID (e.g., "2501.02725", "2501.02725v5")'), section: z.string().optional().describe('Section heading or "lines:100-200". Triggers full text fetch.'), save_path: z.string().optional().describe('Save full text to file. Triggers full text fetch.'), }), }, - src/providers/arxiv/provider.ts:13-16 (registration)The provider registration that routes the 'arxiv:get' tool call to handleGet(). The ArxivProvider registers the tool and dispatches calls via a switch statement.
async handleToolCall(name: string, args: Record<string, unknown>): Promise<ToolResult> { switch (name) { case 'arxiv:search': return handleSearch(this.client, args); case 'arxiv:get': return handleGet(this.client, args); - src/shared/extract-section.ts:5-27 (helper)A generic helper used by handleGet to extract a section by heading or line range from the markdown text.
export function extractSection(text: string, section: string): string { // Line range: "lines:100-200" const lineMatch = section.match(/^lines?:(\d+)-(\d+)$/i); if (lineMatch) { const lines = text.split('\n'); return lines.slice(Number(lineMatch[1]) - 1, Number(lineMatch[2])).join('\n'); } // Heading match: find section by text, end at next heading of same/higher level const lines = text.split('\n'); const needle = section.toLowerCase(); const startIdx = lines.findIndex(l => l.toLowerCase().includes(needle)); if (startIdx === -1) return `Section "${section}" not found.`; const headingMatch = lines[startIdx].match(/^(#{1,6})\s/); const level = headingMatch ? headingMatch[1].length : 99; let endIdx = lines.length; for (let i = startIdx + 1; i < lines.length; i++) { const m = lines[i].match(/^(#{1,6})\s/); if (m && m[1].length <= level) { endIdx = i; break; } } return lines.slice(startIdx, endIdx).join('\n'); } - Helper that converts arXiv HTML to Markdown using TurndownService, used by handleGet for full text extraction.
export function htmlToMarkdown(html: string): string { const $ = load(html); // Strip nav, TOC, header, footer, scripts, styles $('nav, header, footer, script, style, .ltx_page_navbar, .ltx_page_header, .ltx_page_footer, .package-alerts').remove(); const body = $('.ltx_page_content').html() || $('body').html() || ''; return turndown.turndown(body).trim(); }