Skip to main content
Glama
openags

Paper Search MCP

by openags

read_doaj_paper

Extract text content from DOAJ papers by providing the paper identifier. Downloads and processes PDFs to retrieve readable text for research and analysis.

Instructions

Read and extract text content from a DOAJ paper.

Args: paper_id: DOAJ paper identifier. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: Extracted text content.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paper_idYes
save_pathNo./downloads

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The tool registration and handler function 'read_doaj_paper' in the MCP server. It delegates to 'doaj_searcher.read_paper'.
    @mcp.tool()
    async def read_doaj_paper(paper_id: str, save_path: str = "./downloads") -> str:
        """Read and extract text content from a DOAJ paper.
    
        Args:
            paper_id: DOAJ paper identifier.
            save_path: Directory where the PDF is/will be saved (default: './downloads').
        Returns:
            str: Extracted text content.
        """
        return doaj_searcher.read_paper(paper_id, save_path)
  • The actual implementation of 'read_paper' within the 'DOAJSearcher' class, which handles the PDF download and text extraction.
    def read_paper(self, paper_id: str, save_path: str = "./downloads") -> str:
        """Read paper text from PDF.
    
        Args:
            paper_id: Paper identifier
            save_path: Directory where PDF is/will be saved
    
        Returns:
            Extracted text content
    
        Raises:
            NotImplementedError: If PDF cannot be read
        """
        try:
            # Try to download PDF first
            pdf_path = self.download_pdf(paper_id, save_path)
    
            # Extract text from PDF
            from PyPDF2 import PdfReader
            reader = PdfReader(pdf_path)
            text = ""
            for page in reader.pages:
                page_text = page.extract_text()
                if page_text:
                    text += page_text + "\n"
            return text.strip()
        except Exception as e:
            logger.error(f"Error reading DOAJ paper {paper_id}: {e}")
            raise NotImplementedError(
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions that the tool extracts text content and saves a PDF to a directory, but lacks critical behavioral details: whether it downloads the paper first (implied but not explicit), potential rate limits, error handling (e.g., if the paper_id is invalid), or permissions required. For a tool with no annotations, this leaves significant gaps in understanding its operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear purpose statement followed by Args and Returns sections. It's front-loaded and wastes no words, though the 'Args' and 'Returns' labels are slightly redundant given the structured schema. Every sentence earns its place by adding value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, 0% schema coverage, but an output schema exists (so return values are documented), the description is moderately complete. It covers the core purpose and parameters but misses behavioral aspects like error handling, dependencies (e.g., network access), or performance implications. For a tool that likely involves downloading and processing, more context would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides clear semantics for both parameters: 'paper_id' as a DOAJ paper identifier and 'save_path' as the directory for PDF saving with a default. This adds meaningful context beyond the bare schema, though it doesn't specify format details (e.g., paper_id structure or save_path validation).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Read and extract text content') and resource ('from a DOAJ paper'), distinguishing it from sibling tools like 'download_doaj' (which likely downloads files) and 'search_doaj' (which searches metadata). The verb+resource combination is precise and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., whether the paper must be accessible or if authentication is needed), nor does it differentiate from similar tools like 'read_arxiv_paper' or 'download_doaj' in terms of use cases. The description only states what it does, not when to choose it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/openags/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server