Medical GraphRAG Assistant

MIT License

AWS_GRAPHRAG_KNOWLEDGE_GRAPH_COMPLETE.md•10.4 kB

# AWS GraphRAG Knowledge Graph - Complete ✅ **Date**: December 13, 2025 **Status**: **KNOWLEDGE GRAPH POPULATED** **Environment**: AWS EC2 g5.xlarge (3.84.250.46) --- ## Summary Successfully built and populated a medical knowledge graph on AWS IRIS with: - **83 medical entities** extracted from 51 FHIR clinical notes - **540 entity relationships** (co-occurrence based) - **1024-dimensional NVIDIA NIM embeddings** for all entities - **Full GraphRAG infrastructure** ready for multi-modal search --- ## Knowledge Graph Statistics ### Entities by Type | Entity Type | Count | Examples | |-------------|-------|----------| | TEMPORAL | 43 | "during hospitalization", "on admission" | | SYMPTOM | 21 | "discomfort", "tenderness", "vomiting", "diarrhea" | | BODY_PART | 10 | "respiratory", "chest", "abdomen" | | CONDITION | 6 | "infection", "fever", "pain" | | MEDICATION | 2 | Drug names | | PROCEDURE | 1 | Medical procedures | **Total**: 83 unique entities ### Relationships | Relationship Type | Count | Description | |-------------------|-------|-------------| | CO_OCCURS_WITH | 540 | Entities appearing in same document | **Total**: 540 relationships --- ## Database Schema ### Tables Created #### SQLUser.Entities ```sql CREATE TABLE SQLUser.Entities ( EntityID BIGINT PRIMARY KEY AUTO_INCREMENT, EntityText VARCHAR(500), EntityType VARCHAR(100), ResourceID BIGINT, Confidence DOUBLE, EmbeddingVector VECTOR(DOUBLE, 1024), -- NVIDIA NIM embeddings ExtractedAt TIMESTAMP ) ``` #### SQLUser.EntityRelationships ```sql CREATE TABLE SQLUser.EntityRelationships ( RelationshipID BIGINT PRIMARY KEY AUTO_INCREMENT, SourceEntityID BIGINT, -- Fixed: uses entity IDs, not text TargetEntityID BIGINT, -- Fixed: uses entity IDs, not text RelationshipType VARCHAR(100), ResourceID BIGINT, -- Source document Confidence DOUBLE, ExtractedAt TIMESTAMP, Context VARCHAR(1000) ) ``` --- ## Issues Resolved ### Issue 1: iris-vector-rag GraphRAG Bug ✅ FIXED - **Problem**: `UnboundLocalError: cannot access local variable 'time'` - **Root Cause**: Duplicate `import time` statement (line 144) causing Python scoping issue - **Solution**: Removed duplicate import from `/Users/tdyar/ws/iris-vector-rag-private/iris_vector_rag/pipelines/graphrag.py` - **Status**: Fixed in local development version, awaiting upstream contribution ### Issue 2: Schema Mismatch - Entity IDs ✅ FIXED - **Problem**: `EntityRelationships` table expected entity IDs (bigint) but script inserted entity text (varchar) - **Solution**: 1. Built entity text → ID mapping after entity insertion 2. Used entity IDs when inserting relationships - **Code Change**: `scripts/aws/extract-entities-aws.py` lines 334-380 - **Status**: Fixed and verified ### Issue 3: Missing ResourceID ✅ FIXED - **Problem**: `EntityRelationships` table requires ResourceID field - **Solution**: Tracked source document ID with each relationship - **Code Change**: Modified relationship tuple to include `resource_id` field - **Status**: Fixed and verified --- ## Scripts Used ### 1. Direct SQL Approach (Used for Population) **File**: `scripts/aws/extract-entities-aws.py` **What it does**: 1. Loads 51 FHIR documents from `SQLUser.FHIRDocuments` 2. Extracts entities using regex patterns (SYMPTOM, CONDITION, etc.) 3. Generates 1024-dim embeddings via NVIDIA Hosted NIM API 4. Stores entities in `SQLUser.Entities` with embeddings 5. Builds entity ID mapping 6. Stores relationships in `SQLUser.EntityRelationships` using entity IDs **Status**: ✅ Successfully populated knowledge graph ### 2. iris-vector-rag Pipeline (Ready for Use) **File**: `scripts/aws/build-knowledge-graph-aws.py` **What it does**: 1. Uses ConfigurationManager for settings 2. Uses IRISVectorStore abstraction 3. Uses GraphRAGPipeline for entity extraction 4. Handles schema management automatically **Status**: ⚠️ Needs NVIDIA NIM API key configuration --- ## Next Steps ### 1. Configure NVIDIA NIM API Key The iris-vector-rag GraphRAG pipeline needs NVIDIA NIM credentials: ```bash # Option A: Environment variable export NVIDIA_API_KEY="nvapi-..." # Option B: Add to config/fhir_graphrag_config.aws.yaml embeddings: provider: "nvidia_nim" api_key: "nvapi-..." base_url: "https://integrate.api.nvidia.com/v1" ``` ### 2. Test GraphRAG Multi-Modal Search Once configured, test knowledge graph queries: ```python from iris_vector_rag.pipelines import GraphRAGPipeline pipeline = GraphRAGPipeline(config_manager=config, vector_store=vector_store) # Query combines: # - Vector search (embeddings) # - Text search (keywords) # - Graph traversal (entity relationships) results = pipeline.query("chest pain", top_k=5) ``` ### 3. Deploy NIM LLM Service (Optional) For answer generation, deploy NVIDIA NIM LLM on AWS: ```yaml llm: provider: "nvidia_nim" model: "nvidia/llama-3.1-nemotron-70b-instruct" base_url: "http://3.84.250.46:8000" ``` --- ## Data Sources ### FHIR Documents - **Source**: `SQLUser.FHIRDocuments` - **Count**: 51 DocumentReference resources - **Format**: Hex-encoded clinical notes - **Origin**: Synthea synthetic patient data ### Embeddings - **Model**: NVIDIA NV-EmbedQA-E5-v5 - **Dimension**: 1024 - **API**: NVIDIA Hosted NIM (integrate.api.nvidia.com) - **Cost**: Free tier (1000 requests/day) ### Vector Storage - **Table**: `SQLUser.ClinicalNoteVectors` - **Documents**: 51 clinical notes - **Embeddings**: 51 x 1024-dimensional vectors - **Database**: InterSystems IRIS on AWS EC2 --- ## Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ AWS EC2 g5.xlarge │ │ InterSystems IRIS │ ├─────────────────────────────────────────────────────────────┤ │ │ │ SQLUser.FHIRDocuments │ │ ├── 51 FHIR DocumentReference resources │ │ └── Hex-encoded clinical notes │ │ │ │ SQLUser.ClinicalNoteVectors │ │ ├── 51 document embeddings (1024-dim) │ │ └── NVIDIA NIM embeddings │ │ │ │ SQLUser.Entities ✅ NEW │ │ ├── 83 medical entities │ │ ├── Entity types: SYMPTOM, CONDITION, etc. │ │ └── 1024-dim embeddings per entity │ │ │ │ SQLUser.EntityRelationships ✅ NEW │ │ ├── 540 entity relationships │ │ └── CO_OCCURS_WITH relationships │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ Vector Search │ Graph Traversal ▼ ▼ ┌──────────────────┐ ┌──────────────────┐ │ NVIDIA NIM API │ │ GraphRAG Query │ │ Hosted Service │ │ Engine │ └──────────────────┘ └──────────────────┘ ``` --- ## Verification ### Entity Count ```sql SELECT COUNT(*) FROM SQLUser.Entities; -- Result: 83 ``` ### Relationship Count ```sql SELECT COUNT(*) FROM SQLUser.EntityRelationships; -- Result: 540 ``` ### Entity Type Distribution ```sql SELECT EntityType, COUNT(*) FROM SQLUser.Entities GROUP BY EntityType ORDER BY COUNT(*) DESC; ``` Result: ``` TEMPORAL : 43 SYMPTOM : 21 BODY_PART : 10 CONDITION : 6 MEDICATION : 2 PROCEDURE : 1 ``` ### Sample Query ```sql SELECT e1.EntityText as Source, r.RelationshipType, e2.EntityText as Target FROM SQLUser.EntityRelationships r JOIN SQLUser.Entities e1 ON r.SourceEntityID = e1.EntityID JOIN SQLUser.Entities e2 ON r.TargetEntityID = e2.EntityID LIMIT 5; ``` --- ## Performance Notes ### Entity Extraction - **Time**: ~2 minutes for 51 documents - **Method**: Regex-based pattern matching - **Throughput**: ~25 documents/minute ### Embedding Generation - **API**: NVIDIA Hosted NIM - **Latency**: ~100ms per entity - **Total Time**: ~8 seconds for 83 entities - **Cost**: Free (hosted tier) ### Database Operations - **Entity Inserts**: Batched (commit every 50) - **Relationship Inserts**: Batched (commit every 50) - **Total Storage**: <1MB (entities + relationships) --- ## Related Documents - **Bug Report**: `IRIS_VECTOR_RAG_GRAPHRAG_BUG_REPORT.md` - **Bug Resolution**: `IRIS_VECTOR_RAG_GRAPHRAG_BUG_RESOLUTION.md` - **AWS Config**: `config/fhir_graphrag_config.aws.yaml` - **Extraction Script**: `scripts/aws/extract-entities-aws.py` - **Pipeline Script**: `scripts/aws/build-knowledge-graph-aws.py` --- ## Success Criteria ✅ - [x] GraphRAG bug fixed (import time) - [x] Entity extraction working - [x] 83 entities stored with embeddings - [x] 540 relationships stored with entity IDs - [x] Schema compatible with iris-vector-rag - [x] Knowledge graph queryable - [ ] NVIDIA NIM API key configured (next step) - [ ] Multi-modal search tested (next step) --- **Status**: ✅ **KNOWLEDGE GRAPH READY FOR QUERYING** The knowledge graph is fully populated and ready for GraphRAG queries. Next step is to configure NVIDIA NIM API credentials for the iris-vector-rag pipeline.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server