Skip to main content
Glama
flyanima

Open Search MCP

by flyanima

pdf_discovery

Search and list PDF documents from academic, technical, and web sources without full processing, enabling fast discovery of research papers, reports, and manuals.

Instructions

Discover PDF documents without full processing - fast PDF search and listing

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYesSearch query for PDF document discovery
maxResultsNoMaximum number of results to return (default: 20)
sourcesNoSources to search (arxiv, pubmed, web, all)
documentTypeNoType of documents to search for

Implementation Reference

  • Main handler function for pdf_discovery tool. Performs PDF search using PDFProcessor.searchPDFs and returns list of discovered documents with metadata.
    async function pdfDiscovery(args: ToolInput): Promise<ToolOutput> {
      const { 
        query, 
        maxResults = 20, 
        sources = ['all'],
        documentType = 'any'
      } = args;
      
      try {
        logger.info(`Starting PDF discovery for: ${query}`);
        
        if (!query || typeof query !== 'string') {
          throw new Error('Query parameter is required and must be a string');
        }
        
        const pdfProcessor = new PDFProcessor();
        
        const searchOptions: PDFSearchOptions = {
          query,
          maxDocuments: maxResults,
          documentType: documentType as any,
          includeOCR: false, // Discovery doesn't need OCR
          sources: Array.isArray(sources) ? sources : [sources]
        };
        
        const pdfDocuments = await pdfProcessor.searchPDFs(searchOptions);
        
        const result: ToolOutput = {
          success: true,
          data: {
            query,
            documents: pdfDocuments.map(doc => ({
              id: doc.id,
              title: doc.title,
              url: doc.url,
              source: doc.source,
              relevanceScore: doc.relevanceScore,
              downloadUrl: doc.downloadUrl,
              fileSize: doc.fileSize
            })),
            totalFound: pdfDocuments.length,
            searchOptions: {
              documentType,
              sources: searchOptions.sources
            },
            searchedAt: new Date().toISOString()
          },
          metadata: {
            sources: ['pdf-discovery'],
            cached: false
          }
        };
    
        logger.info(`PDF discovery completed: ${pdfDocuments.length} documents found for ${query}`);
        return result;
    
      } catch (error) {
        logger.error(`Failed PDF discovery for ${query}:`, error);
        
        return {
          success: false,
          error: `Failed to discover PDFs: ${error instanceof Error ? error.message : 'Unknown error'}`,
          data: null,
          metadata: {
            sources: ['pdf-discovery'],
            cached: false
          }
        };
      }
    }
  • Tool registration: creates the pdf_discovery tool using createTool, assigns inputSchema, and registers it with the registry.
    const pdfDiscoveryTool = createTool(
      'pdf_discovery',
      'Discover PDF documents without full processing - fast PDF search and listing',
      'pdf',
      'pdf-discovery',
      pdfDiscovery,
      {
        cacheTTL: 1800, // 30 minutes cache
        rateLimit: 15,  // 15 requests per minute
        requiredParams: ['query'],
        optionalParams: ['maxResults', 'sources', 'documentType']
      }
    );
    
    pdfDiscoveryTool.inputSchema = {
      type: 'object',
      properties: {
        query: {
          type: 'string',
          description: 'Search query for PDF document discovery'
        },
        maxResults: {
          type: 'number',
          description: 'Maximum number of results to return (default: 20)'
        },
        sources: {
          type: 'array',
          items: { type: 'string' },
          description: 'Sources to search (arxiv, pubmed, web, all)'
        },
        documentType: {
          type: 'string',
          description: 'Type of documents to search for',
          enum: ['academic', 'report', 'manual', 'any']
        }
      },
      required: ['query']
    };
    
    registry.registerTool(pdfDiscoveryTool);
  • Input schema definition for the pdf_discovery tool, specifying properties like query, maxResults, sources, documentType.
    pdfDiscoveryTool.inputSchema = {
      type: 'object',
      properties: {
        query: {
          type: 'string',
          description: 'Search query for PDF document discovery'
        },
        maxResults: {
          type: 'number',
          description: 'Maximum number of results to return (default: 20)'
        },
        sources: {
          type: 'array',
          items: { type: 'string' },
          description: 'Sources to search (arxiv, pubmed, web, all)'
        },
        documentType: {
          type: 'string',
          description: 'Type of documents to search for',
          enum: ['academic', 'report', 'manual', 'any']
        }
      },
      required: ['query']
    };
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds value by stating 'without full processing' and 'fast,' which hints at performance and scope limitations. However, it doesn't cover critical aspects like rate limits, authentication needs, error handling, or what 'discovery' entails (e.g., metadata vs. content). For a tool with 4 parameters and no annotations, this is a moderate but incomplete effort.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded: a single sentence with zero waste. Every word earns its place by conveying purpose and key traits ('fast,' 'without full processing'), making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, no output schema, no annotations), the description is minimally adequate. It covers the basic purpose and hints at behavior but lacks details on usage context, return values, or error handling. For a search tool with multiple parameters, this leaves gaps that could hinder effective agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description doesn't add any parameter-specific details beyond what's in the schema (e.g., it doesn't explain 'query' syntax or 'sources' implications). Baseline 3 is appropriate as the schema handles the heavy lifting, but no extra semantic value is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Discover PDF documents without full processing - fast PDF search and listing.' It specifies the verb ('discover'), resource ('PDF documents'), and key behavioral traits ('without full processing,' 'fast'). However, it doesn't explicitly differentiate from siblings like 'search_arxiv' or 'search_pubmed,' which might also find PDFs, so it misses full sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions 'fast PDF search and listing' but doesn't compare to siblings like 'search_arxiv' or 'deep_research,' nor does it specify prerequisites or exclusions. This lack of contextual usage advice leaves the agent with minimal direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/flyanima/open-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server