pdf_discovery
Search and list PDF documents quickly by query, source, and type without full processing. Streamline discovery of academic, reports, manuals, and more.
Instructions
Discover PDF documents without full processing - fast PDF search and listing
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| documentType | No | Type of documents to search for | |
| maxResults | No | Maximum number of results to return (default: 20) | |
| query | Yes | Search query for PDF document discovery | |
| sources | No | Sources to search (arxiv, pubmed, web, all) |
Input Schema (JSON Schema)
{
"properties": {
"documentType": {
"description": "Type of documents to search for",
"enum": [
"academic",
"report",
"manual",
"any"
],
"type": "string"
},
"maxResults": {
"description": "Maximum number of results to return (default: 20)",
"type": "number"
},
"query": {
"description": "Search query for PDF document discovery",
"type": "string"
},
"sources": {
"description": "Sources to search (arxiv, pubmed, web, all)",
"items": {
"type": "string"
},
"type": "array"
}
},
"required": [
"query"
],
"type": "object"
}
Implementation Reference
- src/tools/pdf/research.ts:173-242 (handler)The main handler function for the pdf_discovery tool. It searches for PDF documents using PDFProcessor.searchPDFs based on the query and options, returning a list of discovered PDFs with metadata like title, URL, relevance score, without full processing.async function pdfDiscovery(args: ToolInput): Promise<ToolOutput> { const { query, maxResults = 20, sources = ['all'], documentType = 'any' } = args; try { logger.info(`Starting PDF discovery for: ${query}`); if (!query || typeof query !== 'string') { throw new Error('Query parameter is required and must be a string'); } const pdfProcessor = new PDFProcessor(); const searchOptions: PDFSearchOptions = { query, maxDocuments: maxResults, documentType: documentType as any, includeOCR: false, // Discovery doesn't need OCR sources: Array.isArray(sources) ? sources : [sources] }; const pdfDocuments = await pdfProcessor.searchPDFs(searchOptions); const result: ToolOutput = { success: true, data: { query, documents: pdfDocuments.map(doc => ({ id: doc.id, title: doc.title, url: doc.url, source: doc.source, relevanceScore: doc.relevanceScore, downloadUrl: doc.downloadUrl, fileSize: doc.fileSize })), totalFound: pdfDocuments.length, searchOptions: { documentType, sources: searchOptions.sources }, searchedAt: new Date().toISOString() }, metadata: { sources: ['pdf-discovery'], cached: false } }; logger.info(`PDF discovery completed: ${pdfDocuments.length} documents found for ${query}`); return result; } catch (error) { logger.error(`Failed PDF discovery for ${query}:`, error); return { success: false, error: `Failed to discover PDFs: ${error instanceof Error ? error.message : 'Unknown error'}`, data: null, metadata: { sources: ['pdf-discovery'], cached: false } }; } }
- src/tools/pdf/research.ts:406-445 (registration)Creates the pdf_discovery tool using createTool, sets its inputSchema, and registers it with the tool registry via registry.registerTool.const pdfDiscoveryTool = createTool( 'pdf_discovery', 'Discover PDF documents without full processing - fast PDF search and listing', 'pdf', 'pdf-discovery', pdfDiscovery, { cacheTTL: 1800, // 30 minutes cache rateLimit: 15, // 15 requests per minute requiredParams: ['query'], optionalParams: ['maxResults', 'sources', 'documentType'] } ); pdfDiscoveryTool.inputSchema = { type: 'object', properties: { query: { type: 'string', description: 'Search query for PDF document discovery' }, maxResults: { type: 'number', description: 'Maximum number of results to return (default: 20)' }, sources: { type: 'array', items: { type: 'string' }, description: 'Sources to search (arxiv, pubmed, web, all)' }, documentType: { type: 'string', description: 'Type of documents to search for', enum: ['academic', 'report', 'manual', 'any'] } }, required: ['query'] }; registry.registerTool(pdfDiscoveryTool);
- src/tools/pdf/research.ts:420-444 (schema)Input schema for the pdf_discovery tool defining the expected parameters: query (required), maxResults, sources, documentType.pdfDiscoveryTool.inputSchema = { type: 'object', properties: { query: { type: 'string', description: 'Search query for PDF document discovery' }, maxResults: { type: 'number', description: 'Maximum number of results to return (default: 20)' }, sources: { type: 'array', items: { type: 'string' }, description: 'Sources to search (arxiv, pubmed, web, all)' }, documentType: { type: 'string', description: 'Type of documents to search for', enum: ['academic', 'report', 'manual', 'any'] } }, required: ['query'] };