M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

Overview Schema Related Servers Score Discussions

Mimir
docs
server
indexing

DocumentParser.md•3.71 KiB

[**mimir v1.0.0**](../README.md) *** [mimir](../README.md) / indexing/DocumentParser # indexing/DocumentParser ## Classes ### DocumentParser Defined in: src/indexing/DocumentParser.ts:8 #### Constructors ##### Constructor > **new DocumentParser**(): [`DocumentParser`](#documentparser) ###### Returns [`DocumentParser`](#documentparser) #### Methods ##### extractText() > **extractText**(`buffer`, `extension`): `Promise`\<`string`\> Defined in: src/indexing/DocumentParser.ts:65 Extract plain text from PDF or DOCX files for indexing Parses binary document formats and extracts readable text content. Used by FileIndexer to make documents searchable and embeddable. Automatically detects format from extension and uses appropriate parser. Supported Formats: - **PDF**: Uses pdf-parse library for text extraction - **DOCX**: Uses mammoth library for text extraction ###### Parameters ###### buffer `Buffer` File content as Buffer ###### extension `string` File extension (.pdf, .docx) ###### Returns `Promise`\<`string`\> Extracted plain text content ###### Throws If format is unsupported or extraction fails ###### Examples ```ts // Extract text from PDF file const parser = new DocumentParser(); const pdfBuffer = await fs.readFile('/path/to/document.pdf'); const text = await parser.extractText(pdfBuffer, '.pdf'); console.log('Extracted', text.length, 'characters'); console.log('First 100 chars:', text.substring(0, 100)); ``` ```ts // Extract text from DOCX file const docxBuffer = await fs.readFile('/path/to/document.docx'); const text = await parser.extractText(docxBuffer, '.docx'); console.log('Document text:', text); ``` ```ts // Handle extraction errors try { const buffer = await fs.readFile('/path/to/doc.pdf'); const text = await parser.extractText(buffer, '.pdf'); if (text.length === 0) { console.warn('Document is empty'); } } catch (error) { if (error.message.includes('no extractable text')) { console.log('PDF is image-based or encrypted'); } else { console.error('Extraction failed:', error.message); } } ``` ```ts // Use in file indexing pipeline const files = await glob('docs/*.{pdf,docx}'); for (const file of files) { const buffer = await fs.readFile(file); const ext = path.extname(file); const text = await parser.extractText(buffer, ext); await indexDocument(file, text); } ``` ##### isSupportedFormat() > **isSupportedFormat**(`extension`): `boolean` Defined in: src/indexing/DocumentParser.ts:160 Check if a file extension is supported for document parsing Tests whether the parser can extract text from files with the given extension. Use this before attempting extraction to avoid errors. ###### Parameters ###### extension `string` File extension (e.g., '.pdf', '.docx') ###### Returns `boolean` true if format is supported, false otherwise ###### Examples ```ts // Check before parsing const parser = new DocumentParser(); const file = '/path/to/document.pdf'; const ext = path.extname(file); if (parser.isSupportedFormat(ext)) { const buffer = await fs.readFile(file); const text = await parser.extractText(buffer, ext); console.log('Extracted:', text.length, 'chars'); } else { console.log('Unsupported format:', ext); } ``` ```ts // Filter files by supported formats const allFiles = await glob('documents/*.*'); const supportedFiles = allFiles.filter(file => { const ext = path.extname(file); return parser.isSupportedFormat(ext); }); console.log('Can parse', supportedFiles.length, 'files'); ``` ```ts // Build supported extensions list const extensions = ['.pdf', '.docx', '.txt', '.md', '.doc']; const supported = extensions.filter(ext => parser.isSupportedFormat(ext)); console.log('Supported:', supported.join(', ')); // Output: Supported: .pdf, .docx ```

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

DocumentParser.md•3.71 KiB