Medical GraphRAG Assistant

MIT License

IRIS_VECTOR_RAG_GRAPHRAG_BUG_REPORT.md•9.98 kB

# iris-vector-rag v0.5.4 GraphRAG Pipeline Bug Report **Date**: December 13, 2025 **Reporter**: AWS GraphRAG deployment testing **Version**: iris-vector-rag v0.5.4 **Severity**: High (blocks GraphRAG pipeline usage) --- ## Summary The `GraphRAGPipeline.ingest()` method in iris-vector-rag v0.5.4 fails with an `UnboundLocalError` due to missing `time` module import in the GraphRAG pipeline implementation. --- ## Error Details ### Error Message ```python UnboundLocalError: cannot access local variable 'time' where it is not associated with a value ``` ### Stack Trace ``` File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/iris_vector_rag/core/base.py", line 145, in ingest self.load_documents("", documents=documents, **kwargs) File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/iris_vector_rag/pipelines/graphrag.py", line 93, in load_documents start_time = time.time() ^^^^ UnboundLocalError: cannot access local variable 'time' where it is not associated with a value ``` ### Root Cause The file `iris_vector_rag/pipelines/graphrag.py` uses `time.time()` at line 93 but does not import the `time` module. --- ## Reproduction Steps ### 1. Setup ```python import os os.environ['VECTOR_DIMENSION'] = '1024' os.environ['IRIS_HOST'] = '3.84.250.46' os.environ['IRIS_PORT'] = '1972' os.environ['IRIS_NAMESPACE'] = '%SYS' os.environ['IRIS_USER'] = '_SYSTEM' os.environ['IRIS_PASSWORD'] = 'SYS' from iris_vector_rag.config.manager import ConfigurationManager from iris_vector_rag.storage.vector_store_iris import IRISVectorStore from iris_vector_rag.pipelines import GraphRAGPipeline from iris_vector_rag.core.models import Document ``` ### 2. Initialize Pipeline ```python config = ConfigurationManager(config_path="config/fhir_graphrag_config.aws.yaml") vector_store = IRISVectorStore(config_manager=config) pipeline = GraphRAGPipeline( config_manager=config, vector_store=vector_store ) ``` ### 3. Prepare Documents ```python documents = [ Document( page_content="Clinical note content here", id="doc1", metadata={'resource_type': 'DocumentReference'} ) ] ``` ### 4. Trigger Error ```python pipeline.ingest(documents) # ❌ UnboundLocalError: cannot access local variable 'time' ``` --- ## Expected Behavior The `ingest()` method should: 1. Accept a list of `Document` objects 2. Extract entities from document content 3. Build knowledge graph with entities and relationships 4. Store embeddings in IRIS vector tables 5. Return successfully without errors --- ## Actual Behavior The `ingest()` method immediately fails with `UnboundLocalError` when attempting to execute `start_time = time.time()` because the `time` module is not imported in `graphrag.py`. --- ## Proposed Fix ### File: `iris_vector_rag/pipelines/graphrag.py` **Add missing import at the top of the file:** ```python import time ``` **Before:** ```python # graphrag.py from iris_vector_rag.core.base import RAGPipeline # ... other imports ... # Missing: import time ``` **After:** ```python # graphrag.py import time from iris_vector_rag.core.base import RAGPipeline # ... other imports ... ``` --- ## Workaround Until this is fixed in a future release, users can: ### Option 1: Monkey Patch ```python import time import iris_vector_rag.pipelines.graphrag as graphrag_module graphrag_module.time = time ``` ### Option 2: Use Alternative Pipelines ```python # Use BasicRAGPipeline or CRAGPipeline instead from iris_vector_rag.pipelines import BasicRAGPipeline pipeline = BasicRAGPipeline(config_manager=config, vector_store=vector_store) ``` ### Option 3: Manual Entity Extraction ```python # Extract entities manually without using GraphRAG pipeline # (Requires custom implementation) ``` --- ## Environment Details ### System - **OS**: macOS Darwin 24.5.0 - **Python**: 3.12 - **Architecture**: Apple Silicon (ARM64) ### Package Versions ``` iris-vector-rag==0.5.4 intersystems-irispython==5.0.1 sentence-transformers==3.3.1 ``` ### Database - **IRIS**: InterSystems IRIS Community Edition (latest) - **Deployment**: AWS EC2 g5.xlarge (us-east-1) - **Namespace**: %SYS - **Schema**: SQLUser ### Configuration ```yaml # config/fhir_graphrag_config.aws.yaml database: iris: host: "3.84.250.46" port: 1972 namespace: "%SYS" storage: vector_dimension: 1024 distance_metric: "COSINE" index_type: "HNSW" knowledge_graph: entities_table: "SQLUser.Entities" relationships_table: "SQLUser.EntityRelationships" ``` --- ## Impact Assessment ### Severity: **High** - **Blocks GraphRAG functionality entirely** - Affects all users attempting to use GraphRAG pipeline - No graceful degradation or fallback ### Affected Users - Anyone using `GraphRAGPipeline.ingest()` method - Cloud deployments requiring GraphRAG features - Multi-modal search implementations ### Workaround Difficulty - **Easy** for developers familiar with Python imports - **Difficult** for users relying on packaged functionality - Requires code modification or monkey patching --- ## Testing Recommendations ### Unit Test to Prevent Regression ```python def test_graphrag_pipeline_ingest(): """Test that GraphRAG pipeline can ingest documents without import errors.""" from iris_vector_rag.pipelines import GraphRAGPipeline from iris_vector_rag.core.models import Document # Mock config and vector store pipeline = GraphRAGPipeline(config_manager=mock_config, vector_store=mock_store) docs = [Document(page_content="test", id="1", metadata={})] # Should not raise UnboundLocalError try: pipeline.ingest(docs) except UnboundLocalError as e: pytest.fail(f"GraphRAG pipeline missing import: {e}") ``` ### Integration Test ```python def test_graphrag_pipeline_end_to_end(): """Test full GraphRAG pipeline with real IRIS connection.""" # Setup real config pointing to test IRIS instance config = ConfigurationManager(config_path="test_config.yaml") vector_store = IRISVectorStore(config_manager=config) pipeline = GraphRAGPipeline(config_manager=config, vector_store=vector_store) # Create test documents docs = [ Document(page_content="Patient has chest pain", id="1", metadata={}), Document(page_content="Patient prescribed aspirin", id="2", metadata={}) ] # Ingest should complete without errors pipeline.ingest(docs) # Query should return results results = pipeline.query("chest pain", top_k=1) assert len(results) > 0 ``` --- ## Related Issues ### Similar Import Issues to Check 1. Are there other missing imports in `graphrag.py`? 2. Does `HybridGraphRAGPipeline` have the same issue? 3. Are there missing imports in other pipeline implementations? ### Suggested Code Review - **File**: `iris_vector_rag/pipelines/graphrag.py` - **Check**: All standard library imports present - **Verify**: `time`, `datetime`, `os`, `sys`, `json`, etc. --- ## Additional Context ### Use Case We are deploying a FHIR GraphRAG system on AWS EC2 with: - 51 DocumentReference resources (clinical notes) - NVIDIA NIM embeddings (1024-dimensional) - InterSystems IRIS vector database - Knowledge graph for medical entity extraction ### Why This Matters GraphRAG is a key feature differentiator for iris-vector-rag. The pipeline should work out-of-the-box for users following the documentation. This import error creates a poor first-use experience and blocks cloud deployment workflows. ### Documentation Impact Once fixed, documentation should include: - End-to-end GraphRAG pipeline example - Cloud deployment guide (AWS, Azure, GCP) - Troubleshooting section for common pipeline errors --- ## Verification Steps ### After Fix is Applied 1. **Install fixed version** ```bash pip install iris-vector-rag>=0.5.5 # or whatever version contains the fix ``` 2. **Run test script** ```python from iris_vector_rag.pipelines import GraphRAGPipeline from iris_vector_rag.core.models import Document # Should import without errors import iris_vector_rag.pipelines.graphrag as graphrag assert hasattr(graphrag, 'time'), "time module should be imported" ``` 3. **Test ingest** ```python pipeline = GraphRAGPipeline(config_manager=config, vector_store=vector_store) docs = [Document(page_content="test", id="1", metadata={})] # Should complete without UnboundLocalError pipeline.ingest(docs) print("✅ GraphRAG ingest working!") ``` 4. **Test query** ```python results = pipeline.query("test query", top_k=5) print(f"✅ GraphRAG query returned {len(results)} results") ``` --- ## Priority Recommendation **Priority**: **P0 (Critical)** **Rationale**: - Core feature completely broken - Simple one-line fix - Blocks user adoption of GraphRAG - Affects cloud deployment workflows - Easy to miss in development if not testing GraphRAG path **Timeline**: Should be fixed in next patch release (v0.5.5) --- ## Contact For questions about this bug report, please refer to: - **Repository**: FHIR-AI-Hackathon-Kit AWS deployment - **Configuration**: `config/fhir_graphrag_config.aws.yaml` - **Test Script**: `scripts/aws/build-knowledge-graph-aws.py` - **Full Stack Trace**: See error details section above --- ## Acknowledgments Thank you to the iris-vector-rag team for creating this valuable framework. This bug report is provided in the spirit of improving the library for the entire community. The overall architecture and API design of iris-vector-rag v0.5.4 is excellent - this is just a minor import oversight that slipped through. **Positive Feedback**: - ✅ ConfigurationManager works perfectly - ✅ IRISVectorStore abstractions are clean - ✅ Document model is intuitive - ✅ Pipeline factory pattern is elegant - ✅ CloudConfiguration API properly reads `storage.vector_dimension` This bug is easily fixable and doesn't detract from the overall quality of the package.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server