ingest_file
Add documents to your local vector database for semantic search. Supports PDF, DOCX, TXT, and Markdown files to build searchable knowledge bases.
Instructions
Ingest a document file (PDF, DOCX, TXT, MD) into the vector database for semantic search. File path must be an absolute path. Supports re-ingestion to update existing documents.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filePath | Yes | Absolute path to the file to ingest. Example: "/Users/user/documents/manual.pdf" |
Implementation Reference
- src/server/index.ts:283-402 (handler)Core execution logic for the 'ingest_file' MCP tool. Parses the file using DocumentParser, chunks with DocumentChunker, embeds with Embedder, deletes existing chunks from VectorStore, inserts new vectorized chunks with backup/rollback support for re-ingestion.async handleIngestFile( args: IngestFileInput ): Promise<{ content: [{ type: 'text'; text: string }] }> { let backup: VectorChunk[] | null = null try { // Parse file const text = await this.parser.parseFile(args.filePath) // Split text into chunks const chunks = await this.chunker.chunkText(text) // Generate embeddings const embeddings = await this.embedder.embedBatch(chunks.map((chunk) => chunk.text)) // Create backup (if existing data exists) try { const existingFiles = await this.vectorStore.listFiles() const existingFile = existingFiles.find((file) => file.filePath === args.filePath) if (existingFile && existingFile.chunkCount > 0) { // Backup existing data (retrieve via search) const queryVector = embeddings[0] || [] if (queryVector.length === 384) { const allChunks = await this.vectorStore.search(queryVector, 20) // Retrieve max 20 items backup = allChunks .filter((chunk) => chunk.filePath === args.filePath) .map((chunk) => ({ id: randomUUID(), filePath: chunk.filePath, chunkIndex: chunk.chunkIndex, text: chunk.text, vector: queryVector, // Use dummy vector since actual vector cannot be retrieved metadata: chunk.metadata, timestamp: new Date().toISOString(), })) } console.error(`Backup created: ${backup?.length || 0} chunks for ${args.filePath}`) } } catch (error) { // Backup creation failure is warning only (for new files) console.warn('Failed to create backup (new file?):', error) } // Delete existing data await this.vectorStore.deleteChunks(args.filePath) console.error(`Deleted existing chunks for: ${args.filePath}`) // Create vector chunks const timestamp = new Date().toISOString() const vectorChunks: VectorChunk[] = chunks.map((chunk, index) => { const embedding = embeddings[index] if (!embedding) { throw new Error(`Missing embedding for chunk ${index}`) } return { id: randomUUID(), filePath: args.filePath, chunkIndex: chunk.index, text: chunk.text, vector: embedding, metadata: { fileName: args.filePath.split('/').pop() || args.filePath, fileSize: text.length, fileType: args.filePath.split('.').pop() || '', }, timestamp, } }) // Insert vectors (transaction processing) try { await this.vectorStore.insertChunks(vectorChunks) console.error(`Inserted ${vectorChunks.length} chunks for: ${args.filePath}`) // Delete backup on success backup = null } catch (insertError) { // Rollback on error if (backup && backup.length > 0) { console.error('Ingestion failed, rolling back...', insertError) try { await this.vectorStore.insertChunks(backup) console.error(`Rollback completed: ${backup.length} chunks restored`) } catch (rollbackError) { console.error('Rollback failed:', rollbackError) throw new Error( `Failed to ingest file and rollback failed: ${(insertError as Error).message}` ) } } throw insertError } // Result const result: IngestResult = { filePath: args.filePath, chunkCount: chunks.length, timestamp, } return { content: [ { type: 'text', text: JSON.stringify(result, null, 2), }, ], } } catch (error) { // Error handling: suppress stack trace in production const errorMessage = process.env['NODE_ENV'] === 'production' ? (error as Error).message : (error as Error).stack || (error as Error).message console.error('Failed to ingest file:', errorMessage) throw new Error(`Failed to ingest file: ${errorMessage}`) } }
- src/server/index.ts:49-52 (schema)TypeScript interface defining the input parameters for the ingest_file tool.export interface IngestFileInput { /** File path */ filePath: string }
- src/server/index.ts:162-176 (registration)MCP tool registration in ListToolsRequestHandler, defining name, description, and JSON schema for input validation.name: 'ingest_file', description: 'Ingest a document file (PDF, DOCX, TXT, MD) into the vector database for semantic search. File path must be an absolute path. Supports re-ingestion to update existing documents.', inputSchema: { type: 'object', properties: { filePath: { type: 'string', description: 'Absolute path to the file to ingest. Example: "/Users/user/documents/manual.pdf"', }, }, required: ['filePath'], }, },
- src/server/index.ts:65-72 (schema)TypeScript interface defining the output structure returned by the ingest_file tool handler.export interface IngestResult { /** File path */ filePath: string /** Chunk count */ chunkCount: number /** Timestamp */ timestamp: string }
- src/server/index.ts:217-220 (registration)Dispatch logic in CallToolRequestHandler that routes 'ingest_file' calls to the specific handler function.case 'ingest_file': return await this.handleIngestFile( request.params.arguments as unknown as IngestFileInput )