search_pdf
Search for text within a PDF file and retrieve each match with its surrounding context, enabling quick location and review of specific content.
Instructions
Search for text in a PDF and return all matches with context
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filePath | Yes | Absolute path to the PDF file | |
| searchTerm | Yes | Text to search for in the PDF | |
| caseSensitive | No | Whether the search should be case sensitive |
Implementation Reference
- src/index.ts:78-100 (registration)Tool registration: 'search_pdf' tool definition with name, description, and inputSchema for filePath, searchTerm, and caseSensitive parameters.
{ name: 'search_pdf', description: 'Search for text in a PDF and return all matches with context', inputSchema: { type: 'object', properties: { filePath: { type: 'string', description: 'Absolute path to the PDF file', }, searchTerm: { type: 'string', description: 'Text to search for in the PDF', }, caseSensitive: { type: 'boolean', description: 'Whether the search should be case sensitive', default: false, }, }, required: ['filePath', 'searchTerm'], }, }, - src/index.ts:242-259 (handler)Handler: 'search_pdf' case in the CallToolRequestSchema handler. It extracts filePath, searchTerm, and optional caseSensitive from args, calls searchInPDF(), and returns results as JSON.
case 'search_pdf': { const { filePath, searchTerm, caseSensitive } = args as { filePath: string; searchTerm: string; caseSensitive?: boolean; }; const results = await searchInPDF(filePath, searchTerm, caseSensitive); return { content: [ { type: 'text', text: JSON.stringify(results, null, 2), }, ], }; } - src/pdf-tools.ts:148-189 (helper)Helper: searchInPDF() - the core implementation that reads a PDF file, iterates through each page, uses a regex to find matches, and returns SearchResult objects with page number, matched text, context (50 chars on each side), and position.
export async function searchInPDF( filePath: string, searchTerm: string, caseSensitive: boolean = false ): Promise<SearchResult[]> { try { const dataBuffer = await fs.readFile(filePath); const parser = new PDFParse({ data: dataBuffer }); const info = await parser.getInfo(); const results: SearchResult[] = []; const searchRegex = new RegExp( searchTerm.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), caseSensitive ? 'g' : 'gi' ); // Search through each page for (let pageNo = 1; pageNo <= info.total; pageNo++) { const pageResult = await parser.getText({ partial: [pageNo] }); const pageText = pageResult.text; let match; while ((match = searchRegex.exec(pageText)) !== null) { const contextStart = Math.max(0, match.index - 50); const contextEnd = Math.min(pageText.length, match.index + match[0].length + 50); const context = pageText.substring(contextStart, contextEnd).replace(/\n/g, ' '); results.push({ page: pageNo, text: match[0], context: `...${context}...`, position: match.index, }); } } await parser.destroy(); return results; } catch (error) { throw new Error(`Failed to search PDF: ${error instanceof Error ? error.message : String(error)}`); } } - src/types.ts:36-41 (schema)Schema: SearchResult interface defining the type returned by searchInPDF - includes page number, matched text, context (with surrounding chars), and position index.
export interface SearchResult { page: number; text: string; context: string; position: number; }