ingest_docs
Processes documents from a directory to enable semantic search capabilities, updating the knowledge base when documents change or searches return no results.
Instructions
Re-ingest documents from the configured documents directory. Use this if search returns no results or if documents have been updated
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Implementation Reference
- src/mcp/tools.ts:58-67 (handler)The tool call handler for 'ingest_docs' that calls ragService.ingestDocuments() and returns the processing results with document count and chunk count.
export async function handleToolCall(toolName: string, args: any, ragService: RAGService): Promise<any> { switch (toolName) { case 'ingest_docs': const result = await ragService.ingestDocuments(); return { success: true, message: `Successfully processed ${result.processed} documents and created ${result.chunks} chunks`, processed: result.processed, chunks: result.chunks }; - src/services/ragService.ts:24-60 (handler)The core ingestDocuments() method that processes documents from a directory, uses documentProcessor to create chunks, deletes existing documents by source, and adds new chunks to the vector store.
async ingestDocuments(directoryPath?: string): Promise<{ processed: number; chunks: number }> { const path = directoryPath || this.config.documentsPath; // Always use absolute path - if config path is already absolute, use it; otherwise resolve it const absolutePath = path.startsWith('/') ? path : resolve(process.cwd(), path); console.error(`Processing documents from: ${path}`); console.error(`Absolute path: ${absolutePath}`); let chunks; try { chunks = await this.documentProcessor.processDirectory(absolutePath); console.error(`Document processor returned ${chunks.length} chunks`); if (chunks.length === 0) { console.error('No documents found to process'); return { processed: 0, chunks: 0 }; } } catch (error) { console.error('Error in document processing:', error); throw error; } console.error(`Generated ${chunks.length} chunks from documents`); const uniqueSources = new Set(chunks.map(chunk => chunk.metadata.source)); console.error(`Processing ${uniqueSources.size} unique documents`); for (const source of uniqueSources) { await this.vectorStore.deleteBySource(source); } await this.vectorStore.addDocuments(chunks); console.error(`Successfully ingested ${chunks.length} chunks from ${uniqueSources.size} documents`); return { processed: uniqueSources.size, chunks: chunks.length }; } - src/mcp/tools.ts:6-13 (schema)The tool schema/registration for 'ingest_docs' defining its name, description, and empty inputSchema (no parameters required).
{ name: 'ingest_docs', description: 'Re-ingest documents from the configured documents directory. Use this if search returns no results or if documents have been updated', inputSchema: { type: 'object', properties: {} } }, - Helper method processDirectory() that globs for .pdf, .md, and .txt files and processes each file to create DocumentChunks.
async processDirectory(directoryPath: string): Promise<DocumentChunk[]> { const allChunks: DocumentChunk[] = []; try { const files = await glob(`${directoryPath}/**/*.{pdf,md,txt}`, { ignore: ['node_modules/**', '.git/**'] }); for (const filePath of files) { try { const chunks = await this.processFile(filePath); allChunks.push(...chunks); } catch (error) { console.warn(`Failed to process file ${filePath}:`, error); } } } catch (error) { throw new Error(`Failed to process directory ${directoryPath}: ${error}`); } return allChunks; } - Helper method processFile() that reads different file types (PDF, Markdown, text), extracts content and metadata, then chunks the text using tiktoken tokenizer.
async processFile(filePath: string): Promise<DocumentChunk[]> { const extension = extname(filePath).toLowerCase(); const fileName = basename(filePath); if (!statSync(filePath).isFile()) { throw new Error(`File does not exist: ${filePath}`); } let content: string; let metadata: any = {}; try { switch (extension) { case '.pdf': try { const pdfModule = await import('pdf-parse'); const pdf = pdfModule.default; const pdfBuffer = readFileSync(filePath); const pdfData = await pdf(pdfBuffer); content = pdfData.text; metadata = { title: pdfData.info?.Title || fileName, pages: pdfData.numpages }; } catch (error) { console.warn(`Failed to parse PDF ${filePath}:`, error); return []; } break; case '.md': const mdContent = readFileSync(filePath, 'utf-8'); const parsed = matter(mdContent); content = parsed.content; metadata = { title: parsed.data.title || fileName, ...parsed.data }; break; case '.txt': content = readFileSync(filePath, 'utf-8'); metadata = { title: fileName }; break; default: throw new Error(`Unsupported file type: ${extension}`); } } catch (error) { throw new Error(`Failed to read file ${filePath}: ${error}`); } const chunks = this.chunkText(content, filePath, metadata); return chunks; }