arxiv:get

Retrieve an arXiv paper by ID. Default returns metadata and abstract; specify section or save_path to fetch full HTML text (papers from ~2024+). Older papers return PDF link.

Instructions

Retrieve an arXiv paper by ID (e.g., "2501.02725"). Default: metadata + abstract. With section or save_path: fetches HTML full text (available for papers from ~2024+). Older papers without HTML return metadata + abstract + PDF link.

Input Schema

TableJSON Schema

Name	Required	Description
`id`	Yes	arXiv ID (e.g., "2501.02725", "2501.02725v5")
`section`	No	Section heading or "lines:100-200". Triggers full text fetch.
`save_path`	No	Save full text to file. Triggers full text fetch.

Implementation Reference

src/providers/arxiv/tools/get.ts:8-45 (handler)

The handler function that executes the 'arxiv:get' tool logic. Fetches an arXiv paper by ID, returns metadata + abstract (default), or full HTML text converted to markdown (when section/save_path provided).

export async function handleGet(client: ArxivClient, args: Record<string, unknown>): Promise<ToolResult> {
  const { id, section, save_path } = args as { id: string; section?: string; save_path?: string };

  // Always fetch metadata from Atom API
  const { entries } = await client.search({ id_list: id, max_results: 1 });
  if (!entries.length) return { content: [{ type: 'text', text: `Paper ${id} not found.` }], isError: true };

  const entry = entries[0];
  const header = [
    `# ${entry.title}`,
    `\n**Autoren:** ${entry.authors.join(', ')}`,
    `**Datum:** ${entry.published} | **Kategorien:** ${entry.categories.join(', ')}`,
    entry.doi ? `**DOI:** ${entry.doi}` : '',
    entry.journalRef ? `**Journal:** ${entry.journalRef}` : '',
    `**PDF:** ${entry.pdfUrl}`,
  ].filter(Boolean).join('\n');

  // Full text only when section or save_path requested
  if (!section && !save_path) {
    return { content: [{ type: 'text', text: `${header}\n\n## Abstract\n\n${entry.summary}` }] };
  }

  const html = await client.getHtml(entry.id);
  if (!html) {
    const msg = `${header}\n\n## Abstract\n\n${entry.summary}\n\n---\n*Full HTML text not available for this paper (pre-2024). Use the PDF link above.*`;
    return { content: [{ type: 'text', text: msg }] };
  }

  const markdown = `${header}\n\n---\n\n${htmlToMarkdown(html)}`;

  if (save_path) {
    mkdirSync(dirname(save_path), { recursive: true });
    writeFileSync(save_path, markdown, 'utf-8');
    return { content: [{ type: 'text', text: `Saved to ${save_path} (${markdown.length} chars)` }] };
  }

  return { content: [{ type: 'text', text: extractSection(markdown, section!) }] };
}

src/providers/arxiv/tools/index.ts:18-29 (schema)

The tool definition with input schema for 'arxiv:get'. Defines the name, description, and Zod validation schema (id required, section and save_path optional).

{
  name: 'arxiv:get',
  description:
    'Retrieve an arXiv paper by ID (e.g., "2501.02725"). ' +
    'Default: metadata + abstract. With `section` or `save_path`: fetches HTML full text (available for papers from ~2024+). ' +
    'Older papers without HTML return metadata + abstract + PDF link.',
  inputSchema: z.object({
    id: z.string().describe('arXiv ID (e.g., "2501.02725", "2501.02725v5")'),
    section: z.string().optional().describe('Section heading or "lines:100-200". Triggers full text fetch.'),
    save_path: z.string().optional().describe('Save full text to file. Triggers full text fetch.'),
  }),
},

src/providers/arxiv/provider.ts:13-16 (registration)

The provider registration that routes the 'arxiv:get' tool call to handleGet(). The ArxivProvider registers the tool and dispatches calls via a switch statement.

async handleToolCall(name: string, args: Record<string, unknown>): Promise<ToolResult> {
  switch (name) {
    case 'arxiv:search': return handleSearch(this.client, args);
    case 'arxiv:get': return handleGet(this.client, args);

src/shared/extract-section.ts:5-27 (helper)

A generic helper used by handleGet to extract a section by heading or line range from the markdown text.

export function extractSection(text: string, section: string): string {
  // Line range: "lines:100-200"
  const lineMatch = section.match(/^lines?:(\d+)-(\d+)$/i);
  if (lineMatch) {
    const lines = text.split('\n');
    return lines.slice(Number(lineMatch[1]) - 1, Number(lineMatch[2])).join('\n');
  }

  // Heading match: find section by text, end at next heading of same/higher level
  const lines = text.split('\n');
  const needle = section.toLowerCase();
  const startIdx = lines.findIndex(l => l.toLowerCase().includes(needle));
  if (startIdx === -1) return `Section "${section}" not found.`;

  const headingMatch = lines[startIdx].match(/^(#{1,6})\s/);
  const level = headingMatch ? headingMatch[1].length : 99;
  let endIdx = lines.length;
  for (let i = startIdx + 1; i < lines.length; i++) {
    const m = lines[i].match(/^(#{1,6})\s/);
    if (m && m[1].length <= level) { endIdx = i; break; }
  }
  return lines.slice(startIdx, endIdx).join('\n');
}

src/providers/arxiv/converter.ts:12-18 (helper)

Helper that converts arXiv HTML to Markdown using TurndownService, used by handleGet for full text extraction.

export function htmlToMarkdown(html: string): string {
  const $ = load(html);
  // Strip nav, TOC, header, footer, scripts, styles
  $('nav, header, footer, script, style, .ltx_page_navbar, .ltx_page_header, .ltx_page_footer, .package-alerts').remove();
  const body = $('.ltx_page_content').html() || $('body').html() || '';
  return turndown.turndown(body).trim();
}

German Legal MCP Server

arxiv:get

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API