Medical GraphRAG Assistant

MIT License

quickstart.md•13.9 kB

# Quick Start Guide: FHIR GraphRAG Knowledge Graph **Feature**: 001-fhir-graphrag **Date**: 2025-11-06 **Target Audience**: Developers and system administrators --- ## Overview This guide provides step-by-step instructions for setting up and using the FHIR GraphRAG knowledge graph system. The system extracts medical entities and relationships from clinical notes in FHIR DocumentReference resources and enables multi-modal search combining vector similarity, text matching, and graph traversal. **Estimated Setup Time**: 15-30 minutes --- ## Prerequisites ### Required Software 1. **InterSystems IRIS Database** - Running instance at localhost:32782 - Namespace: DEMO - Credentials: _SYSTEM / ISCDEMO - Existing FHIR data: 51 DocumentReference resources 2. **Python Environment** - Python 3.12 or higher - Miniconda or virtualenv recommended 3. **rag-templates Library** - Location: `/Users/tdyar/ws/rag-templates` - Verify: `ls /Users/tdyar/ws/rag-templates/iris_rag` ### Optional Software 4. **Ollama LLM Service** (for enhanced entity extraction) - Install: `curl https://ollama.ai/install.sh | sh` - Pull model: `ollama pull gemma3:4b` - Verify: `ollama list` ### Pre-existing Components The following components should already be in place from the existing FHIR AI Hackathon tutorial: - ✅ FHIR native table: `HSFHIR_X0001_R.Rsrc` (2,739 resources) - ✅ Vector table: `VectorSearch.FHIRResourceVectors` (51 DocumentReference vectors) - ✅ Existing script: `direct_fhir_vector_approach.py` (vector search POC) - ✅ Utilities: `Tutorial/Utils/get_iris_connection.py` --- ## Installation ### Step 1: Install Python Dependencies ```bash # Create and activate virtual environment python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install required packages pip install iris-python-driver sentence-transformers PyYAML pytest # Verify rag-templates is accessible export PYTHONPATH="/Users/tdyar/ws/rag-templates:$PYTHONPATH" python3 -c "from iris_rag import create_pipeline; print('✅ rag-templates accessible')" ``` ### Step 2: Create Configuration File Create `config/fhir_graphrag_config.yaml`: ```bash mkdir -p config cp specs/001-fhir-graphrag/contracts/byot-config-schema.yaml config/fhir_graphrag_config.yaml ``` Edit `config/fhir_graphrag_config.yaml` and verify/update: - Database credentials (username/password) - FHIR table name (`HSFHIR_X0001_R.Rsrc`) - Vector table name (`VectorSearch.FHIRResourceVectors`) - LLM settings (if using Ollama) ### Step 3: Verify Database Connection ```bash # Test IRIS connection python3 <<EOF import iris conn = iris.connect('localhost', 32782, 'DEMO', '_SYSTEM', 'ISCDEMO') cursor = conn.cursor() cursor.execute("SELECT COUNT(*) FROM HSFHIR_X0001_R.Rsrc WHERE ResourceType='DocumentReference'") count = cursor.fetchone()[0] print(f"✅ Found {count} DocumentReference resources") cursor.close() conn.close() EOF ``` Expected output: `✅ Found 51 DocumentReference resources` --- ## Knowledge Graph Build ### Step 1: Create Knowledge Graph Tables Run the setup script to create `RAG.Entities` and `RAG.EntityRelationships` tables: ```bash python3 src/setup/fhir_graphrag_setup.py --mode=init ``` This will: - Create `RAG.Entities` table with indexes - Create `RAG.EntityRelationships` table with indexes - Validate FHIR table accessibility - Report table creation status **Expected Output**: ``` [INFO] Initializing GraphRAG knowledge graph tables... [INFO] ✅ Created table RAG.Entities [INFO] ✅ Created table RAG.EntityRelationships [INFO] ✅ Created indexes on RAG.Entities [INFO] ✅ Created indexes on RAG.EntityRelationships [INFO] Initialization complete! ``` ### Step 2: Extract Entities and Build Knowledge Graph Process all 51 DocumentReference resources and extract medical entities: ```bash python3 src/setup/fhir_graphrag_setup.py --mode=build ``` This will: - Load BYOT configuration from `config/fhir_graphrag_config.yaml` - Connect to IRIS database - Query FHIR native tables for DocumentReference resources - Decode hex-encoded clinical notes - Extract medical entities (SYMPTOM, CONDITION, MEDICATION, etc.) - Identify entity relationships (TREATS, CAUSES, etc.) - Store entities and relationships in knowledge graph tables **Expected Output**: ``` [INFO] Loading FHIR GraphRAG configuration... [INFO] ✅ Configuration loaded from config/fhir_graphrag_config.yaml [INFO] Connecting to IRIS database... [INFO] ✅ Connected to localhost:32782, namespace DEMO [INFO] Loading FHIR DocumentReference resources... [INFO] ✅ Loaded 51 DocumentReference resources [INFO] Processing document 1/51... [INFO] ✅ Extracted 5 entities (SYMPTOM: 2, CONDITION: 1, MEDICATION: 2) [INFO] ✅ Identified 3 relationships (TREATS: 2, CAUSES: 1) ... [INFO] Processing document 51/51... [INFO] ===== Knowledge Graph Build Complete ===== [INFO] Total entities extracted: 127 [INFO] Total relationships identified: 64 [INFO] Processing time: 3 minutes 42 seconds [INFO] Average time per document: 4.4 seconds ``` **Performance Targets**: - ✅ Entity extraction: < 2 seconds per document (regex-only) OR < 10 seconds (with LLM) - ✅ Total build time: < 5 minutes for 51 documents - ✅ Entities extracted: 100+ medical entities - ✅ Relationships identified: 50+ entity relationships ### Step 3: Verify Knowledge Graph Query the knowledge graph to verify successful build: ```bash python3 <<EOF import iris conn = iris.connect('localhost', 32782, 'DEMO', '_SYSTEM', 'ISCDEMO') cursor = conn.cursor() # Count entities by type cursor.execute(""" SELECT EntityType, COUNT(*) as Count FROM RAG.Entities GROUP BY EntityType ORDER BY Count DESC """) print("Entities by type:") for row in cursor.fetchall(): print(f" {row[0]}: {row[1]}") # Count relationships by type cursor.execute(""" SELECT RelationshipType, COUNT(*) as Count FROM RAG.EntityRelationships GROUP BY RelationshipType ORDER BY Count DESC """) print("\nRelationships by type:") for row in cursor.fetchall(): print(f" {row[0]}: {row[1]}") cursor.close() conn.close() EOF ``` --- ## Multi-Modal Search Usage ### Basic Query Run a multi-modal search query combining vector, text, and graph search: ```bash python3 src/query/fhir_graphrag_query.py "respiratory symptoms" ``` **Expected Output**: ``` ================================================================================ GraphRAG Query: respiratory symptoms ================================================================================ 📊 Search Results: - Retrieved: 5 documents - Execution time: 0.842s - Retrieval method: rrf (vector + text + graph) 🔬 Extracted Entities: - cough (SYMPTOM) - shortness of breath (SYMPTOM) - chest congestion (SYMPTOM) - albuterol (MEDICATION) - asthma (CONDITION) 🔗 Relationships: - albuterol → TREATS → asthma - asthma → CAUSES → shortness of breath - cough → CO_OCCURS_WITH → chest congestion 💡 Answer: The patient has reported respiratory symptoms including cough, shortness of breath, and chest congestion. Medical history indicates asthma as an underlying condition, treated with albuterol inhaler. 📝 Sources: 1. DocumentReference/12 (Patient/3, 2023-05-15) 2. DocumentReference/23 (Patient/3, 2023-06-01) 3. DocumentReference/34 (Patient/3, 2023-06-15) ================================================================================ ``` ### Patient-Specific Query Filter results to a specific patient: ```bash python3 src/query/fhir_graphrag_query.py "medications for hypertension" --patient=3 ``` ### Query with Custom Parameters ```bash python3 src/query/fhir_graphrag_query.py \ "chronic conditions" \ --patient=3 \ --top-k=10 \ --vector-k=30 \ --text-k=30 \ --graph-k=10 ``` **Parameters**: - `--patient`: Filter results to specific patient ID - `--top-k`: Number of final results (default: 10) - `--vector-k`: Top K from vector search (default: 30) - `--text-k`: Top K from text search (default: 30) - `--graph-k`: Top K from graph traversal (default: 10) --- ## Demo Queries Try these example queries to explore the knowledge graph: ### 1. Symptom-Based Query ```bash python3 src/query/fhir_graphrag_query.py "Has the patient reported any respiratory symptoms?" ``` **What it demonstrates**: Vector search for symptom entities, graph traversal of CO_OCCURS_WITH relationships ### 2. Medication Query ```bash python3 src/query/fhir_graphrag_query.py "What medications has the patient been prescribed?" ``` **What it demonstrates**: Entity type filtering (MEDICATION), TREATS relationships to conditions ### 3. Temporal Query ```bash python3 src/query/fhir_graphrag_query.py "Timeline of patient's medical events" ``` **What it demonstrates**: Temporal entity extraction, PRECEDES relationships ### 4. Condition-Symptom Relationship ```bash python3 src/query/fhir_graphrag_query.py "What conditions are associated with chest pain?" ``` **What it demonstrates**: Graph traversal (2-hop), CAUSES relationships --- ## Performance Benchmarks ### Expected Performance (51 DocumentReference dataset) | Operation | Target | Typical | |-----------|--------|---------| | Entity extraction (per document) | < 2 sec | 4-5 sec (with LLM), 0.5 sec (regex-only) | | Knowledge graph build (51 documents) | < 5 min | 3-4 min | | Multi-modal query | < 1 sec | 0.8-0.9 sec | | Graph traversal (2-hop) | < 500 ms | 200-300 ms | ### Monitoring Performance View real-time performance metrics: ```bash # View extraction metrics python3 src/setup/fhir_graphrag_setup.py --mode=stats # View query performance logs tail -f logs/fhir_graphrag.log | grep "query_latency" ``` --- ## Troubleshooting ### Issue: rag-templates library not found **Error**: `ModuleNotFoundError: No module named 'iris_rag'` **Solution**: ```bash export PYTHONPATH="/Users/tdyar/ws/rag-templates:$PYTHONPATH" # Or add to ~/.bashrc or ~/.zshrc for persistence ``` ### Issue: IRIS connection failure **Error**: `ConnectionRefusedError: [Errno 61] Connection refused` **Solution**: 1. Verify IRIS container is running: `docker ps | grep iris-fhir` 2. Check port mapping: Should show `32782->1972/tcp` 3. Test connection: `telnet localhost 32782` 4. Restart container if needed: `docker restart iris-fhir` ### Issue: Low entity extraction accuracy **Problem**: Many irrelevant or low-confidence entities extracted **Solution**: 1. Increase confidence threshold in `config/fhir_graphrag_config.yaml`: ```yaml pipelines: graphrag: min_entity_confidence: 0.8 # Increase from 0.7 ``` 2. Enable LLM-based extraction (more accurate than regex-only) 3. Review sample entities and adjust regex patterns if needed ### Issue: Ollama LLM not available **Error**: `ConnectionError: Failed to connect to Ollama at http://localhost:11434` **Solution**: 1. Check Ollama status: `ollama list` 2. Start Ollama service: `ollama serve` 3. Or disable LLM and use regex-only: ```yaml llm: fallback_to_regex: true ``` ### Issue: Knowledge graph build too slow **Problem**: Build taking > 5 minutes for 51 documents **Solution**: 1. Disable LLM and use regex-only (faster): ```yaml llm: provider: null # Disable LLM ``` 2. Increase batch size: ```yaml pipelines: graphrag: batch_size: 20 # Increase from 10 ``` 3. Enable parallel extraction: ```yaml pipelines: graphrag: parallel_extraction: true max_workers: 8 # Increase from 4 ``` ### Issue: Multi-modal query returns no results **Problem**: Query returns empty results despite entities in knowledge graph **Solution**: 1. Check patient filter is not too restrictive 2. Verify entities exist for query terms: ```sql SELECT * FROM RAG.Entities WHERE EntityText LIKE '%respiratory%'; ``` 3. Lower RRF threshold or increase top-k values 4. Check vector embeddings are populated --- ## Next Steps ### After Successful Setup 1. **Explore Entity Types**: Review extracted entities by type to understand medical concept coverage 2. **Analyze Relationships**: Examine entity relationships to validate medical accuracy 3. **Tune Parameters**: Adjust confidence thresholds, RRF weights, and top-k values based on results 4. **Extend Entity Types**: Add custom entity types for domain-specific concepts 5. **Benchmark Performance**: Profile extraction and query performance with larger datasets ### Advanced Usage - **Custom Entity Extraction**: Modify `src/extractors/medical_entity_extractor.py` with domain-specific patterns - **Relationship Rules**: Add custom relationship extraction rules for medical domain - **Integration**: Integrate with existing `direct_fhir_vector_approach.py` for hybrid search - **Visualization**: Build knowledge graph visualization using NetworkX or D3.js - **Real-Time Updates**: Implement incremental entity extraction for new FHIR resources --- ## Additional Resources ### Documentation - Full specification: `specs/001-fhir-graphrag/spec.md` - Implementation plan: `specs/001-fhir-graphrag/plan.md` - Data model: `specs/001-fhir-graphrag/data-model.md` - Research findings: `specs/001-fhir-graphrag/research.md` ### Code References - rag-templates GraphRAG pipeline: `/Users/tdyar/ws/rag-templates/iris_rag/pipelines/graphrag.py` - BYOT specification: `/Users/tdyar/ws/rag-templates/specs/014-byot-as-described/spec.md` ### Support - File issues: GitHub repository issues tracker - Contact: Project maintainer --- **Setup Status Checklist**: - [ ] Python dependencies installed - [ ] Configuration file created - [ ] IRIS database connection verified - [ ] Knowledge graph tables created - [ ] Entities extracted (100+) - [ ] Relationships identified (50+) - [ ] Multi-modal query tested - [ ] Performance benchmarks met **Happy Querying!** 🎉

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server