Skip to main content
Glama

ingest_docs

Processes documents from a directory to enable semantic search capabilities, updating the knowledge base when documents change or searches return no results.

Instructions

Re-ingest documents from the configured documents directory. Use this if search returns no results or if documents have been updated

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The tool call handler for 'ingest_docs' that calls ragService.ingestDocuments() and returns the processing results with document count and chunk count.
    export async function handleToolCall(toolName: string, args: any, ragService: RAGService): Promise<any> {
      switch (toolName) {
        case 'ingest_docs':
          const result = await ragService.ingestDocuments();
          return {
            success: true,
            message: `Successfully processed ${result.processed} documents and created ${result.chunks} chunks`,
            processed: result.processed,
            chunks: result.chunks
          };
  • The core ingestDocuments() method that processes documents from a directory, uses documentProcessor to create chunks, deletes existing documents by source, and adds new chunks to the vector store.
    async ingestDocuments(directoryPath?: string): Promise<{ processed: number; chunks: number }> {
      const path = directoryPath || this.config.documentsPath;
      
      // Always use absolute path - if config path is already absolute, use it; otherwise resolve it
      const absolutePath = path.startsWith('/') ? path : resolve(process.cwd(), path);
      
      console.error(`Processing documents from: ${path}`);
      console.error(`Absolute path: ${absolutePath}`);
      
      let chunks;
      try {
        chunks = await this.documentProcessor.processDirectory(absolutePath);
        console.error(`Document processor returned ${chunks.length} chunks`);
        
        if (chunks.length === 0) {
          console.error('No documents found to process');
          return { processed: 0, chunks: 0 };
        }
      } catch (error) {
        console.error('Error in document processing:', error);
        throw error;
      }
    
      console.error(`Generated ${chunks.length} chunks from documents`);
      
      const uniqueSources = new Set(chunks.map(chunk => chunk.metadata.source));
      console.error(`Processing ${uniqueSources.size} unique documents`);
    
      for (const source of uniqueSources) {
        await this.vectorStore.deleteBySource(source);
      }
    
      await this.vectorStore.addDocuments(chunks);
      
      console.error(`Successfully ingested ${chunks.length} chunks from ${uniqueSources.size} documents`);
      return { processed: uniqueSources.size, chunks: chunks.length };
    }
  • The tool schema/registration for 'ingest_docs' defining its name, description, and empty inputSchema (no parameters required).
    {
      name: 'ingest_docs',
      description: 'Re-ingest documents from the configured documents directory. Use this if search returns no results or if documents have been updated',
      inputSchema: {
        type: 'object',
        properties: {}
      }
    },
  • Helper method processDirectory() that globs for .pdf, .md, and .txt files and processes each file to create DocumentChunks.
    async processDirectory(directoryPath: string): Promise<DocumentChunk[]> {
      const allChunks: DocumentChunk[] = [];
      
      try {
        const files = await glob(`${directoryPath}/**/*.{pdf,md,txt}`, {
          ignore: ['node_modules/**', '.git/**']
        });
    
        for (const filePath of files) {
          try {
            const chunks = await this.processFile(filePath);
            allChunks.push(...chunks);
          } catch (error) {
            console.warn(`Failed to process file ${filePath}:`, error);
          }
        }
      } catch (error) {
        throw new Error(`Failed to process directory ${directoryPath}: ${error}`);
      }
    
      return allChunks;
    }
  • Helper method processFile() that reads different file types (PDF, Markdown, text), extracts content and metadata, then chunks the text using tiktoken tokenizer.
    async processFile(filePath: string): Promise<DocumentChunk[]> {
      const extension = extname(filePath).toLowerCase();
      const fileName = basename(filePath);
      
      if (!statSync(filePath).isFile()) {
        throw new Error(`File does not exist: ${filePath}`);
      }
      
      let content: string;
      let metadata: any = {};
    
      try {
        switch (extension) {
          case '.pdf':
            try {
              const pdfModule = await import('pdf-parse');
              const pdf = pdfModule.default;
              const pdfBuffer = readFileSync(filePath);
              const pdfData = await pdf(pdfBuffer);
              content = pdfData.text;
              metadata = {
                title: pdfData.info?.Title || fileName,
                pages: pdfData.numpages
              };
            } catch (error) {
              console.warn(`Failed to parse PDF ${filePath}:`, error);
              return [];
            }
            break;
          
          case '.md':
            const mdContent = readFileSync(filePath, 'utf-8');
            const parsed = matter(mdContent);
            content = parsed.content;
            metadata = {
              title: parsed.data.title || fileName,
              ...parsed.data
            };
            break;
          
          case '.txt':
            content = readFileSync(filePath, 'utf-8');
            metadata = { title: fileName };
            break;
          
          default:
            throw new Error(`Unsupported file type: ${extension}`);
        }
      } catch (error) {
        throw new Error(`Failed to read file ${filePath}: ${error}`);
      }
    
      const chunks = this.chunkText(content, filePath, metadata);
      return chunks;
    }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LuizDoPc/mcp-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server