pdf_discovery

Instructions

Discover PDF documents without full processing - fast PDF search and listing

Input Schema

TableJSON Schema

Name	Required	Description
`documentType`	No	Type of documents to search for
`maxResults`	No	Maximum number of results to return (default: 20)
`query`	Yes	Search query for PDF document discovery
`sources`	No	Sources to search (arxiv, pubmed, web, all)

Implementation Reference

src/tools/pdf/research.ts:173-242 (handler)
The main handler function for the pdf_discovery tool. It searches for PDF documents using PDFProcessor.searchPDFs based on the query and options, returning a list of discovered PDFs with metadata like title, URL, relevance score, without full processing.
async function pdfDiscovery(args: ToolInput): Promise<ToolOutput> { const { query, maxResults = 20, sources = ['all'], documentType = 'any' } = args; try { logger.info(`Starting PDF discovery for: ${query}`); if (!query || typeof query !== 'string') { throw new Error('Query parameter is required and must be a string'); } const pdfProcessor = new PDFProcessor(); const searchOptions: PDFSearchOptions = { query, maxDocuments: maxResults, documentType: documentType as any, includeOCR: false, // Discovery doesn't need OCR sources: Array.isArray(sources) ? sources : [sources] }; const pdfDocuments = await pdfProcessor.searchPDFs(searchOptions); const result: ToolOutput = { success: true, data: { query, documents: pdfDocuments.map(doc => ({ id: doc.id, title: doc.title, url: doc.url, source: doc.source, relevanceScore: doc.relevanceScore, downloadUrl: doc.downloadUrl, fileSize: doc.fileSize })), totalFound: pdfDocuments.length, searchOptions: { documentType, sources: searchOptions.sources }, searchedAt: new Date().toISOString() }, metadata: { sources: ['pdf-discovery'], cached: false } }; logger.info(`PDF discovery completed: ${pdfDocuments.length} documents found for ${query}`); return result; } catch (error) { logger.error(`Failed PDF discovery for ${query}:`, error); return { success: false, error: `Failed to discover PDFs: ${error instanceof Error ? error.message : 'Unknown error'}`, data: null, metadata: { sources: ['pdf-discovery'], cached: false } }; } }
src/tools/pdf/research.ts:406-445 (registration)
Creates the pdf_discovery tool using createTool, sets its inputSchema, and registers it with the tool registry via registry.registerTool.
const pdfDiscoveryTool = createTool( 'pdf_discovery', 'Discover PDF documents without full processing - fast PDF search and listing', 'pdf', 'pdf-discovery', pdfDiscovery, { cacheTTL: 1800, // 30 minutes cache rateLimit: 15, // 15 requests per minute requiredParams: ['query'], optionalParams: ['maxResults', 'sources', 'documentType'] } ); pdfDiscoveryTool.inputSchema = { type: 'object', properties: { query: { type: 'string', description: 'Search query for PDF document discovery' }, maxResults: { type: 'number', description: 'Maximum number of results to return (default: 20)' }, sources: { type: 'array', items: { type: 'string' }, description: 'Sources to search (arxiv, pubmed, web, all)' }, documentType: { type: 'string', description: 'Type of documents to search for', enum: ['academic', 'report', 'manual', 'any'] } }, required: ['query'] }; registry.registerTool(pdfDiscoveryTool);
src/tools/pdf/research.ts:420-444 (schema)
Input schema for the pdf_discovery tool defining the expected parameters: query (required), maxResults, sources, documentType.
pdfDiscoveryTool.inputSchema = { type: 'object', properties: { query: { type: 'string', description: 'Search query for PDF document discovery' }, maxResults: { type: 'number', description: 'Maximum number of results to return (default: 20)' }, sources: { type: 'array', items: { type: 'string' }, description: 'Sources to search (arxiv, pubmed, web, all)' }, documentType: { type: 'string', description: 'Type of documents to search for', enum: ['academic', 'report', 'manual', 'any'] } }, required: ['query'] };

Open Search MCP

Instructions

Input Schema

Implementation Reference

Other Tools

Related Tools

Latest Blog Posts

MCP directory API