Hybrid RAG Project MCP Server

ASYNC_INGESTION.md•3.91 KiB

# Async Ingestion with Progress Tracking ## Overview The MCP server now supports asynchronous document ingestion with real-time progress tracking. This allows you to monitor the ingestion process as it runs in the background. ## Features ### 1. Asynchronous Processing - Ingestion runs as a background task - Non-blocking operation - you can continue other tasks - Uses Python asyncio for concurrent execution ### 2. Progress Tracking Track ingestion progress with detailed status information: - **Progress percentage**: 0-100% - **Current stage**: - `loading_files` (0-80%): Loading and parsing documents - `building_index` (80-100%): Creating vector embeddings and search index - `completed`: Process finished successfully - **File-level tracking**: - Current file being processed - Files processed vs total files - Documents created count ### 3. Status States - **`not_started`**: No ingestion has been initiated - **`in_progress`**: Currently loading and indexing documents - **`completed`**: Successfully finished ingestion - **`failed`**: Error occurred during ingestion ## MCP Tools ### ingest_documents Starts the asynchronous ingestion process. **Usage in Claude:** ``` "Start ingesting my documents" ``` **Response:** ``` Ingestion started. Use get_ingestion_status to monitor progress. ``` ### get_ingestion_status Returns the current ingestion status and progress. **Usage in Claude:** ``` "Check the ingestion status" ``` **Response (in progress):** ``` Ingestion Status: In Progress Progress: 45% Stage: loading_files Files Processed: 9/20 Current File: document.pdf Documents Loaded: 15 ``` **Response (completed):** ``` Ingestion Status: Completed ✅ Progress: 100% Total Files Processed: 20 Total Documents Loaded: 35 You can now use query_documents to search the documents. ``` ## Implementation Details ### Document Loader Updates The `DocumentLoaderUtility` class now supports: - **Progress callbacks**: Reports progress after each file - **File counting**: Pre-counts files for accurate progress - **Sequential loading**: Processes files in order with progress updates ### MCP Server Updates New global state tracking: ```python ingestion_status = { "status": "not_started", "progress": 0, "current_file": "", "files_processed": 0, "total_files": 0, "documents_loaded": 0, "error_message": None, "stage": "" } ``` ### Progress Calculation - **File loading**: 0-80% (proportional to files processed) - **Index building**: 80-100% (creating embeddings and search index) ## Usage Example ### Step 1: Start Ingestion ``` You: "Please ingest my documents" Claude: [Calls ingest_documents] Response: "Ingestion started. Use get_ingestion_status to monitor progress." ``` ### Step 2: Monitor Progress ``` You: "What's the ingestion status?" Claude: [Calls get_ingestion_status] Response: "Progress: 45%, Files: 9/20, Current: report.pdf" ``` ### Step 3: Wait for Completion ``` You: "Check status again" Claude: [Calls get_ingestion_status] Response: "Completed! 35 documents loaded from 20 files." ``` ### Step 4: Query Documents ``` You: "What are the main topics?" Claude: [Calls query_documents] Response: "Based on the documents, the main topics are..." ``` ## Error Handling If ingestion fails, the status will show: ``` Ingestion Status: Failed ❌ Error: [error message] Files Processed: 5/20 ``` You can restart ingestion by calling `ingest_documents` again. ## Benefits 1. **Better UX**: See progress instead of waiting blindly 2. **Large datasets**: Monitor long-running ingestion tasks 3. **Debugging**: Identify which files cause issues 4. **Non-blocking**: Continue working while documents load 5. **Transparency**: Understand what the system is doing ## Technical Notes - Uses `asyncio.create_task()` for background execution - Progress callback updates global state - `asyncio.to_thread()` runs blocking operations in thread pool - Prevents concurrent ingestion (checks if already in progress)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gwyer/hybrid-rag-project'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ASYNC_INGESTION.md•3.91 KiB