Skip to main content
Glama
metaneutrons

German Legal MCP Server

by metaneutrons

arxiv:get

Retrieve an arXiv paper by ID. Default returns metadata and abstract; specify section or save_path to fetch full HTML text (papers from ~2024+). Older papers return PDF link.

Instructions

Retrieve an arXiv paper by ID (e.g., "2501.02725"). Default: metadata + abstract. With section or save_path: fetches HTML full text (available for papers from ~2024+). Older papers without HTML return metadata + abstract + PDF link.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
idYesarXiv ID (e.g., "2501.02725", "2501.02725v5")
sectionNoSection heading or "lines:100-200". Triggers full text fetch.
save_pathNoSave full text to file. Triggers full text fetch.

Implementation Reference

  • The handler function that executes the 'arxiv:get' tool logic. Fetches an arXiv paper by ID, returns metadata + abstract (default), or full HTML text converted to markdown (when section/save_path provided).
    export async function handleGet(client: ArxivClient, args: Record<string, unknown>): Promise<ToolResult> {
      const { id, section, save_path } = args as { id: string; section?: string; save_path?: string };
    
      // Always fetch metadata from Atom API
      const { entries } = await client.search({ id_list: id, max_results: 1 });
      if (!entries.length) return { content: [{ type: 'text', text: `Paper ${id} not found.` }], isError: true };
    
      const entry = entries[0];
      const header = [
        `# ${entry.title}`,
        `\n**Autoren:** ${entry.authors.join(', ')}`,
        `**Datum:** ${entry.published} | **Kategorien:** ${entry.categories.join(', ')}`,
        entry.doi ? `**DOI:** ${entry.doi}` : '',
        entry.journalRef ? `**Journal:** ${entry.journalRef}` : '',
        `**PDF:** ${entry.pdfUrl}`,
      ].filter(Boolean).join('\n');
    
      // Full text only when section or save_path requested
      if (!section && !save_path) {
        return { content: [{ type: 'text', text: `${header}\n\n## Abstract\n\n${entry.summary}` }] };
      }
    
      const html = await client.getHtml(entry.id);
      if (!html) {
        const msg = `${header}\n\n## Abstract\n\n${entry.summary}\n\n---\n*Full HTML text not available for this paper (pre-2024). Use the PDF link above.*`;
        return { content: [{ type: 'text', text: msg }] };
      }
    
      const markdown = `${header}\n\n---\n\n${htmlToMarkdown(html)}`;
    
      if (save_path) {
        mkdirSync(dirname(save_path), { recursive: true });
        writeFileSync(save_path, markdown, 'utf-8');
        return { content: [{ type: 'text', text: `Saved to ${save_path} (${markdown.length} chars)` }] };
      }
    
      return { content: [{ type: 'text', text: extractSection(markdown, section!) }] };
    }
  • The tool definition with input schema for 'arxiv:get'. Defines the name, description, and Zod validation schema (id required, section and save_path optional).
    {
      name: 'arxiv:get',
      description:
        'Retrieve an arXiv paper by ID (e.g., "2501.02725"). ' +
        'Default: metadata + abstract. With `section` or `save_path`: fetches HTML full text (available for papers from ~2024+). ' +
        'Older papers without HTML return metadata + abstract + PDF link.',
      inputSchema: z.object({
        id: z.string().describe('arXiv ID (e.g., "2501.02725", "2501.02725v5")'),
        section: z.string().optional().describe('Section heading or "lines:100-200". Triggers full text fetch.'),
        save_path: z.string().optional().describe('Save full text to file. Triggers full text fetch.'),
      }),
    },
  • The provider registration that routes the 'arxiv:get' tool call to handleGet(). The ArxivProvider registers the tool and dispatches calls via a switch statement.
    async handleToolCall(name: string, args: Record<string, unknown>): Promise<ToolResult> {
      switch (name) {
        case 'arxiv:search': return handleSearch(this.client, args);
        case 'arxiv:get': return handleGet(this.client, args);
  • A generic helper used by handleGet to extract a section by heading or line range from the markdown text.
    export function extractSection(text: string, section: string): string {
      // Line range: "lines:100-200"
      const lineMatch = section.match(/^lines?:(\d+)-(\d+)$/i);
      if (lineMatch) {
        const lines = text.split('\n');
        return lines.slice(Number(lineMatch[1]) - 1, Number(lineMatch[2])).join('\n');
      }
    
      // Heading match: find section by text, end at next heading of same/higher level
      const lines = text.split('\n');
      const needle = section.toLowerCase();
      const startIdx = lines.findIndex(l => l.toLowerCase().includes(needle));
      if (startIdx === -1) return `Section "${section}" not found.`;
    
      const headingMatch = lines[startIdx].match(/^(#{1,6})\s/);
      const level = headingMatch ? headingMatch[1].length : 99;
      let endIdx = lines.length;
      for (let i = startIdx + 1; i < lines.length; i++) {
        const m = lines[i].match(/^(#{1,6})\s/);
        if (m && m[1].length <= level) { endIdx = i; break; }
      }
      return lines.slice(startIdx, endIdx).join('\n');
    }
  • Helper that converts arXiv HTML to Markdown using TurndownService, used by handleGet for full text extraction.
    export function htmlToMarkdown(html: string): string {
      const $ = load(html);
      // Strip nav, TOC, header, footer, scripts, styles
      $('nav, header, footer, script, style, .ltx_page_navbar, .ltx_page_header, .ltx_page_footer, .package-alerts').remove();
      const body = $('.ltx_page_content').html() || $('body').html() || '';
      return turndown.turndown(body).trim();
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden and discloses key behaviors: default vs full text, availability constraint for old papers, and fallback to PDF link. No mention of rate limits or auth, but not expected for this tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundancy. First sentence states core purpose, second explains conditional behavior. Extremely concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, so description should cover return format. It specifies default return (metadata+abstract) and fallback for old papers, but doesn't detail the full text return format (e.g., string or file). Lacks error handling for invalid IDs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds meaning by explaining that section and save_path trigger full text fetch and noting the age limitation. This goes beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves an arXiv paper by ID, specifying the resource and action. It distinguishes between default behavior and full text retrieval, differentiating it from siblings like arxiv:search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use default (metadata+abstract) vs full text (with section/save_path). It notes the limitation for older papers without HTML. Missing explicit guidance on when not to use or alternatives for finding papers.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/metaneutrons/german-legal-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server