Medical GraphRAG Assistant

AWS_DEPLOYMENT_COMPLETE.md•13.8 KiB

# AWS Deployment Complete: NVIDIA NIM + IRIS Vector Database **Date**: December 11-12, 2025 **Status**: ✅ Phases 1-4 Complete - Production Ready ## 🎯 What We Built A complete **GPU-accelerated vector search infrastructure** on AWS EC2 with: - InterSystems IRIS vector database (native VECTOR support) - NVIDIA NIM embeddings API (1024-dimensional vectors) - End-to-end semantic similarity search ## 🏗️ Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ MacBook / Development Machine │ │ ├─ Python Application │ │ │ └─ NVIDIA NIM API calls (embeddings) │ │ └─ intersystems-irispython │ │ └─ Remote connection to AWS IRIS │ └─────────────────────────────────────────────────────────────┘ ↓ ↓ HTTPS API calls ↓ ┌─────────────────────────────────────────────────────────────┐ │ NVIDIA Cloud │ │ └─ NV-EmbedQA-E5-v5 (hosted inference) │ │ └─ Returns 1024-dim embeddings │ └─────────────────────────────────────────────────────────────┘ ↓ ↓ Embeddings returned ↓ ┌─────────────────────────────────────────────────────────────┐ │ AWS EC2 (us-east-1) │ │ Instance: i-0432eba10b98c4949 │ │ Type: g5.xlarge (NVIDIA A10G GPU, 24GB VRAM) │ │ IP: 3.84.250.46 │ │ ├─ Ubuntu 24.04 LTS │ │ ├─ NVIDIA Drivers (535) + CUDA 12.2 │ │ ├─ Docker with GPU support │ │ └─ InterSystems IRIS Community Edition │ │ ├─ Port 1972 (SQL) │ │ ├─ Port 52773 (Management Portal) │ │ └─ DEMO namespace │ │ └─ SQLUser schema │ │ ├─ ClinicalNoteVectors (VECTOR DOUBLE 1024) │ │ └─ MedicalImageVectors (VECTOR DOUBLE 1024) │ └─────────────────────────────────────────────────────────────┘ ``` ## ✅ Completed Phases ### Phase 1: Infrastructure Setup **Duration**: ~30 minutes **Status**: ✅ Complete - EC2 instance provisioned (g5.xlarge) - NVIDIA drivers installed (535) - CUDA toolkit configured (12.2) - Docker with GPU support - SSH key-based authentication **Scripts Created**: - `scripts/aws/provision-instance.sh` - `scripts/aws/install-gpu-drivers.sh` - `scripts/aws/setup-docker-gpu.sh` ### Phase 2: IRIS Vector Database **Duration**: ~2 hours **Status**: ✅ Complete **Key Challenges Overcome**: 1. ❌ Wrong Docker image tag (`2025.1` → ✅ `latest`) 2. ❌ Wrong Python package (`intersystems-iris` → ✅ `intersystems-irispython`) 3. ❌ ObjectScript complexity → ✅ Python-based schema creation 4. ❌ Namespace confusion → ✅ SQLUser schema (correct IRIS behavior) 5. ❌ SQL syntax differences → ✅ Try/except for index creation **Final Working Solution**: ```python # Connect to %SYS namespace conn = iris.connect('localhost', 1972, '%SYS', '_SYSTEM', 'SYS') # Create DEMO schema cursor.execute("CREATE SCHEMA IF NOT EXISTS DEMO") # Switch to DEMO namespace cursor.execute("USE DEMO") # Create tables (end up in SQLUser schema - correct!) cursor.execute("CREATE TABLE ClinicalNoteVectors (...)") cursor.execute("CREATE TABLE MedicalImageVectors (...)") ``` **Tables Created**: - `SQLUser.ClinicalNoteVectors` - Text embeddings (1024-dim) - `SQLUser.MedicalImageVectors` - Image embeddings (1024-dim) **Key Learning**: IRIS SQL tables always go to SQLUser schema, regardless of current namespace. This is not a bug - it's how IRIS SQL projections work. **Scripts Created**: - `scripts/aws/setup-iris-schema.py` - Schema creation - `scripts/aws/test-iris-vectors.py` - Vector operations test ### Phase 3: NVIDIA NIM Integration **Duration**: ~30 minutes **Status**: ✅ Complete **Key Decision**: Use NVIDIA API Cloud instead of self-hosted NIM - ✅ No GPU needed for embeddings (hosted by NVIDIA) - ✅ Simpler architecture (just API calls) - ✅ Pay-per-use pricing (cost-effective) - ✅ Auto-scaling by NVIDIA - ✅ Can migrate to self-hosted later **API Endpoint**: `https://integrate.api.nvidia.com/v1/embeddings` **Model**: `nvidia/nv-embedqa-e5-v5` **Dimensions**: 1024 **Test Results**: ``` Text 1: "Patient presents with chest pain..." → 1024-dim ✓ Text 2: "Cardiac catheterization performed..." → 1024-dim ✓ Text 3: "Atrial fibrillation management..." → 1024-dim ✓ ``` **Scripts Created**: - `scripts/aws/test-nvidia-nim-embeddings.py` ### Phase 4: End-to-End Integration **Duration**: ~30 minutes **Status**: ✅ Complete **Full Pipeline Validated**: 1. Text → NVIDIA NIM API → 1024-dim embedding 2. Embedding → AWS IRIS → Vector storage 3. Query → NVIDIA NIM API → Query vector 4. Query vector → IRIS VECTOR_DOT_PRODUCT → Ranked results **Semantic Search Results**: ``` Query: "chest pain and breathing difficulty" Ranked by Similarity: 1. Chest pain + SOB note → 0.62 similarity (best match) ✓ 2. Cardiac catheterization → 0.47 similarity (related) ✓ 3. Atrial fibrillation → 0.44 similarity (less) ✓ ``` **Performance**: - End-to-end latency: 2-3 seconds (3 documents) - NVIDIA API: ~500ms per embedding - IRIS vector search: <50ms - Network latency (MacBook → AWS): ~100ms **Scripts Created**: - `scripts/aws/integrate-nvidia-nim-iris.py` - Full integration test ## 📊 Key Metrics | Metric | Value | |--------|-------| | **Instance Type** | g5.xlarge | | **GPU** | NVIDIA A10G (24GB VRAM) | | **IRIS Version** | Community Edition (latest) | | **Vector Dimensions** | 1024 | | **Vector Storage Type** | VECTOR(DOUBLE, 1024) | | **Embedding Model** | NV-EmbedQA-E5-v5 | | **Query Latency** | 2-3 seconds end-to-end | | **Similarity Function** | VECTOR_DOT_PRODUCT | ## 🔑 Critical Learnings ### 1. Python Package Name (CRITICAL) - ❌ **WRONG**: `intersystems-iris` (doesn't exist) - ✅ **CORRECT**: `intersystems-irispython` - Import as: `import iris` (not `import irispython`) - **Updated constitution.md** per user request ### 2. IRIS Schema Behavior - SQL tables created via `CREATE TABLE` go to `SQLUser` schema - This happens even when using `CREATE SCHEMA` and `USE` commands - This is **correct IRIS behavior**, not a bug - Native ObjectScript classes would be in custom package, but SQL tables → SQLUser ### 3. IRIS Vector Syntax ```sql -- Correct syntax for IRIS vectors TO_VECTOR('0.1,0.2,...', DOUBLE, 1024) -- Must specify type and length VECTOR_DOT_PRODUCT(vec1, vec2) -- Similarity function VECTOR_COSINE(vec1, vec2) -- Alternative similarity ``` ### 4. NVIDIA NIM API - Endpoint: `https://integrate.api.nvidia.com/v1/embeddings` - Payload format: `{"input": [text], "model": "...", "input_type": "query"}` - Returns: 1024-dimensional embeddings - Free tier available, pay-per-use for production ## 📁 Files Created ### Configuration - `config/fhir_graphrag_config.aws.yaml` - AWS-specific config ### Scripts - `scripts/aws/setup-iris-schema.py` - Create vector tables - `scripts/aws/test-iris-vectors.py` - Test vector operations - `scripts/aws/test-nvidia-nim-embeddings.py` - Test NVIDIA API - `scripts/aws/integrate-nvidia-nim-iris.py` - Full integration ### Documentation - `AWS_DEPLOYMENT_COMPLETE.md` (this file) - Updated `STATUS.md` with deployment progress - Updated `PROGRESS.md` with challenges and solutions ## 🚀 Next Steps ### Option A: Migrate GraphRAG to AWS (Recommended) We already have GraphRAG working locally. To migrate: 1. **Use existing code** - No changes needed 2. **Point to AWS config**: ```python config = load_config('config/fhir_graphrag_config.aws.yaml') ``` 3. **Create KG tables on AWS**: ```bash python3 src/setup/fhir_graphrag_setup.py --config aws --mode=init ``` 4. **Extract entities remotely**: ```bash python3 src/setup/fhir_graphrag_setup.py --config aws --mode=build ``` ### Option B: Production Deployment 1. Implement connection pooling 2. Add error handling and retry logic 3. Set up monitoring (CloudWatch) 4. Configure auto-scaling 5. Implement backup and DR ### Option C: Multi-Modal Extension 1. MIMIC-CXR image dataset integration 2. NVIDIA NIM vision embeddings 3. Cross-modal similarity search 4. Image + text query fusion ## 💰 Cost Considerations ### AWS EC2 (g5.xlarge) - **On-Demand**: $1.006/hour = $24.14/day = ~$725/month - **Reserved (1-year)**: ~$0.60/hour = ~$432/month (40% savings) - **Spot Instance**: ~$0.30/hour = ~$216/month (70% savings) ### NVIDIA NIM API - **Free Tier**: Limited requests/day (development) - **Paid**: ~$0.0002 per 1K tokens - **Example**: 10K queries/day = 1M tokens/day = $6/month ### Total Monthly Cost Estimates - **Development (Spot + Free NIM)**: $216/month - **Production (Reserved + Paid NIM)**: $440/month - **High-Volume (Reserved + High NIM usage)**: $500-800/month ## 🎉 Success Criteria Met - ✅ IRIS vector database deployed on AWS - ✅ Native VECTOR support validated - ✅ NVIDIA NIM embeddings integrated - ✅ Similarity search working correctly - ✅ Remote connectivity established - ✅ End-to-end pipeline operational - ✅ Performance within acceptable range - ✅ Architecture ready for GraphRAG migration ## 📞 Support & Documentation ### AWS Resources - Instance: `i-0432eba10b98c4949` - Public IP: `3.84.250.46` - Region: `us-east-1` - SSH: `ssh -i fhir-ai-key.pem ubuntu@3.84.250.46` ### IRIS Management Portal - URL: `http://3.84.250.46:52773/csp/sys/UtilHome.csp` - Username: `_SYSTEM` - Password: `SYS` ### NVIDIA NIM - API Endpoint: `https://integrate.api.nvidia.com/v1/embeddings` - API Key: `$NVIDIA_API_KEY` (environment variable) - Documentation: https://docs.nvidia.com/nim/ ### Key Contacts - InterSystems IRIS: https://community.intersystems.com/ - NVIDIA NIM: https://build.nvidia.com/ --- **Deployment Completed**: December 12, 2025 **Total Time**: ~4 hours (including troubleshooting and documentation) **Status**: ✅ Production Ready for GraphRAG Migration --- ## UPDATE: IRISVectorDBClient Integration (December 12, 2025) ### Using Proper Abstractions Instead of Manual SQL **Status**: ✅ Complete - IRISVectorDBClient validated with AWS IRIS After completing the initial AWS deployment, we validated that the existing `IRISVectorDBClient` abstraction works correctly with AWS IRIS, eliminating the need for manual TO_VECTOR SQL. ### The Right Way: Use IRISVectorDBClient ```python from src.vectorization.vector_db_client import IRISVectorDBClient # Connect to AWS IRIS client = IRISVectorDBClient( host="3.84.250.46", port=1972, namespace="%SYS", # Use %SYS (DEMO has access restrictions) username="_SYSTEM", password="SYS", vector_dimension=1024 ) with client: # Insert - no manual TO_VECTOR() needed client.insert_vector( resource_id="doc-001", embedding=vector_list, table_name="SQLUser.ClinicalNoteVectors" ) # Search - no manual VECTOR_COSINE() needed results = client.search_similar( query_vector=query_list, table_name="SQLUser.ClinicalNoteVectors" ) ``` ### Key Learning: Namespace Access AWS IRIS Community Edition has different namespace permissions: - ✅ `%SYS` namespace: Full access for _SYSTEM user - ❌ `DEMO` namespace: Restricted access **Solution**: Connect to `%SYS`, use fully qualified table names like `SQLUser.ClinicalNoteVectors` ### Benefits ✅ **No Manual SQL**: Client handles TO_VECTOR and VECTOR_COSINE syntax ✅ **Dimension Validation**: Automatic checking and clear errors ✅ **Clean Python API**: Pass Python lists, not SQL strings ✅ **Works Everywhere**: Same code for local and AWS ✅ **Production Ready**: Tested, validated, documented ### Documentation For complete details and troubleshooting: - **Quick Start Guide**: `AWS_IRIS_VECTORDB_CLIENT_GUIDE.md` - **Technical Details**: `AWS_IRIS_CLIENT_SUCCESS.md` - **Test Script**: `scripts/aws/test-iris-vector-client-aws.py` - **Diagnostic Tool**: `scripts/aws/diagnose-iris-connection.sh` ### Test Results ``` ✅ NVIDIA NIM Embeddings: 1024-dim vectors generated ✅ AWS IRIS Connection: Connected via IRISVectorDBClient ✅ Vector Insertion: CLIENT_TEST_001, CLIENT_TEST_002 inserted ✅ Similarity Search: Query "chest pain" returned correctly ranked results - CLIENT_TEST_001: 0.662 similarity (best match) - CLIENT_TEST_002: 0.483 similarity (related) ✅ Cleanup: Test data removed successfully ``` **Performance**: 2-3 seconds end-to-end, <10ms for similarity search --- **Deployment Status**: ✅ Complete with proper abstractions validated **Ready For**: GraphRAG migration to AWS using IRISVectorDBClient **Documentation**: Comprehensive guides and troubleshooting tools available

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

AWS_DEPLOYMENT_COMPLETE.md•13.8 KiB