Skip to main content
Glama
deployment-guide.md47.2 kB
# AWS GPU Deployment Guide ## Overview This guide provides step-by-step instructions for deploying a production-grade RAG system on AWS EC2 with GPU acceleration. **Deployment Time:** ~30 minutes **Target Instance:** AWS EC2 g5.xlarge (NVIDIA A10G GPU) **Region:** us-east-1 (configurable) ## Prerequisites ### Required Software - [ ] AWS CLI configured with credentials - [ ] SSH key pair for EC2 access - [ ] NVIDIA NGC API key ([Get one here](https://org.ngc.nvidia.com/setup/api-key)) - [ ] Bash 5.x or later - [ ] Python 3.10 or later (for local validation scripts) ### Required Access - [ ] AWS IAM permissions to create EC2 instances - [ ] AWS IAM permissions to create security groups - [ ] AWS IAM permissions to create EBS volumes - [ ] Outbound internet access for package downloads ### Cost Awareness - **Estimated cost:** $1.006/hour for g5.xlarge instance - **Storage cost:** ~$40/month for 500GB EBS gp3 volume - **Total monthly (24/7):** ~$810/month - **Development (8hrs/day):** ~$270/month ## Quick Start ### 1. Clone and Configure ```bash # Clone repository git clone <repository-url> cd FHIR-AI-Hackathon-Kit # Copy environment template cp config/.env.template .env # Edit .env with your credentials nano .env ``` **Required environment variables:** ```bash AWS_REGION=us-east-1 AWS_INSTANCE_TYPE=g5.xlarge SSH_KEY_NAME=your-key-name SSH_KEY_PATH=/path/to/your-key.pem NVIDIA_API_KEY=nvapi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx NGC_API_KEY=nvapi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx ``` ### 2. Deploy Infrastructure ```bash # Run the automated deployment script ./scripts/aws/deploy.sh --provision ``` This script will automatically: 1. ✅ Provision EC2 g5.xlarge instance with security groups 2. ✅ Install NVIDIA drivers (driver-535, CUDA 12.2) 3. ✅ Configure Docker with GPU runtime 4. ✅ Deploy InterSystems IRIS vector database 5. ✅ Deploy NVIDIA NIM LLM (meta/llama-3.1-8b-instruct) 6. ✅ Create vector tables with 1024-dim embeddings 7. ✅ Verify all services are running **Deployment time:** ~10-15 minutes **For existing instance:** ```bash # Use existing instance export INSTANCE_ID=i-xxxxxxxxxxxxx export PUBLIC_IP=34.xxx.xxx.xxx ./scripts/aws/deploy.sh ``` ### 3. Validate Deployment The deployment includes comprehensive validation to ensure all components are working correctly. #### Running Validation **On local/deployed instance:** ```bash ./scripts/aws/validate-deployment.sh ``` **On remote instance via SSH:** ```bash ./scripts/aws/validate-deployment.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> ``` **Using Python health checks:** ```bash # Run all health checks python src/validation/health_checks.py # Run pytest validation suite pytest src/validation/test_deployment.py -v ``` #### Expected Validation Output Successful validation should show all checks passing: ``` ╔══════════════════════════════════════════════════════════════╗ ║ AWS GPU NIM RAG System Validation ╚══════════════════════════════════════════════════════════════╝ → Checking GPU availability... ✓ GPU detected: NVIDIA A10G Memory: 23028 MB Driver: 535.xxx.xx CUDA: 12.2 → Checking Docker GPU runtime... ✓ Docker can access GPU → Checking IRIS database connectivity... ✓ IRIS container running → Checking IRIS database connection (Python)... ✓ IRIS database connection working → Checking Vector tables existence... ✓ Vector tables validated → Checking NIM LLM service health... ✓ NIM LLM container running ✓ NIM LLM health endpoint responding → Checking NIM LLM inference test... ✓ NIM LLM inference working Test response: 4 ╔══════════════════════════════════════════════════════════════╗ ║ Validation Summary ╚══════════════════════════════════════════════════════════════╝ ✓ All validation checks passed System is ready for use! Next steps: 1. Vectorize clinical notes: python src/vectorization/vectorize_documents.py 2. Test vector search: python src/query/test_vector_search.py 3. Run RAG query: python src/query/rag_query.py --query 'your question' ``` #### Understanding Health Check Results Each health check validates a specific component: | Component | What It Checks | Pass Criteria | |-----------|---------------|---------------| | **GPU** | nvidia-smi command available, GPU detected | GPU name and driver version returned | | **GPU Utilization** | Real-time GPU metrics | Utilization %, memory usage, temperature | | **Docker GPU Runtime** | Docker can access GPU via --gpus flag | Test container can run nvidia-smi | | **IRIS Connection** | IRIS database accepts connections | SELECT 1 query succeeds | | **IRIS Tables** | Vector tables exist with correct schema | ClinicalNoteVectors and MedicalImageVectors found | | **NIM LLM Health** | NIM health endpoint responds | HTTP 200 from /health | | **NIM LLM Inference** | Model can generate responses | Successful completion for test query | #### Health Check Details **Python Health Checks** return structured results: ```python @dataclass class HealthCheckResult: component: str # Component name (e.g., "GPU", "IRIS Connection") status: str # "pass" or "fail" message: str # Human-readable status message details: Dict # Additional diagnostic information ``` Example successful result: ```python HealthCheckResult( component="GPU", status="pass", message="GPU detected: NVIDIA A10G", details={ "gpu_name": "NVIDIA A10G", "driver_version": "535.xxx.xx", "memory_mb": "23028", "cuda_version": "12.2" } ) ``` Example failure result: ```python HealthCheckResult( component="IRIS Connection", status="fail", message="Connection failed: Connection refused", details={ "error_type": "ConnectionError", "host": "localhost", "port": 1972, "suggestion": "Check IRIS container is running: docker ps | grep iris" } ) ``` #### Troubleshooting Failed Validation If validation fails, follow these steps: **1. GPU Check Fails** Symptoms: ``` ✗ GPU not accessible Error: nvidia-smi not found ``` Solutions: ```bash # Reinstall GPU drivers ./scripts/aws/install-gpu-drivers.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> # Verify GPU is detected ssh -i <PATH_TO_KEY> ubuntu@<PUBLIC_IP> nvidia-smi # If still failing, reboot instance aws ec2 reboot-instances --instance-ids <INSTANCE_ID> ``` **2. Docker GPU Check Fails** Symptoms: ``` ✗ Docker cannot access GPU Error: could not select device driver ``` Solutions: ```bash # Reinstall Docker GPU runtime ./scripts/aws/setup-docker-gpu.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> # Manually verify ssh -i <PATH_TO_KEY> ubuntu@<PUBLIC_IP> \ 'docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi' ``` **3. IRIS Connection Fails** Symptoms: ``` ✗ IRIS container not running ``` Solutions: ```bash # Check container status ssh -i <PATH_TO_KEY> ubuntu@<PUBLIC_IP> 'docker ps -a | grep iris' # Restart IRIS deployment ./scripts/aws/deploy-iris.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> --force-recreate # Check logs for errors ssh -i <PATH_TO_KEY> ubuntu@<PUBLIC_IP> 'docker logs iris-vector-db' ``` **4. Vector Tables Missing** Symptoms: ``` ✗ No vector tables found Suggestion: Run: python src/setup/create_text_vector_table.py ``` Solutions: ```bash # Recreate tables python src/setup/create_text_vector_table.py # Or re-run IRIS deployment with schema recreation ./scripts/aws/deploy-iris.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> --force-recreate ``` **5. NIM LLM Not Responding** Symptoms: ``` ! Health endpoint not available (may be initializing) ! NIM LLM inference not responding (may still be loading model) ``` This is normal during initial deployment. The model download and initialization can take 5-10 minutes. Wait and retry: ```bash # Check if model is still downloading ssh -i <PATH_TO_KEY> ubuntu@<PUBLIC_IP> 'docker logs nim-llm --tail 50' # Should see progress like: # "Downloading model... 45%" # "Loading model into GPU memory..." # Wait for completion, then re-run validation ./scripts/aws/validate-deployment.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> ``` If stuck for >15 minutes: ```bash # Restart NIM container ssh -i <PATH_TO_KEY> ubuntu@<PUBLIC_IP> 'docker restart nim-llm' # Re-deploy if restart doesn't help ./scripts/aws/deploy-nim-llm.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> --force-recreate ``` #### Skip Specific Validation Checks To skip specific components during validation: ```bash # Skip GPU checks (for testing without GPU) ./scripts/aws/validate-deployment.sh --skip-gpu # Skip NIM checks (if not deployed yet) ./scripts/aws/validate-deployment.sh --skip-nim # Multiple skips ./scripts/aws/validate-deployment.sh --skip-gpu --skip-nim ``` #### Automated Testing with Pytest Run the pytest test suite for comprehensive validation: ```bash # Run all tests pytest src/validation/test_deployment.py -v # Run specific test class pytest src/validation/test_deployment.py::TestGPU -v # Run integration tests only pytest src/validation/test_deployment.py::TestSystemIntegration -v # Run with detailed output pytest src/validation/test_deployment.py -v --tb=short # Run slow tests (comprehensive inference testing) pytest src/validation/test_deployment.py -v -m slow ``` Expected pytest output: ``` ============================== test session starts =============================== collected 12 items src/validation/test_deployment.py::TestGPU::test_gpu_detected PASSED [ 8%] src/validation/test_deployment.py::TestGPU::test_gpu_utilization PASSED [ 16%] src/validation/test_deployment.py::TestDocker::test_docker_gpu_access PASSED [ 25%] src/validation/test_deployment.py::TestIRIS::test_iris_connection PASSED [ 33%] src/validation/test_deployment.py::TestIRIS::test_iris_tables_exist PASSED [ 41%] src/validation/test_deployment.py::TestNIMLLM::test_nim_llm_health PASSED [ 50%] src/validation/test_deployment.py::TestNIMLLM::test_nim_llm_inference PASSED [ 58%] src/validation/test_deployment.py::TestSystemIntegration::test_all_components_healthy PASSED [ 66%] src/validation/test_deployment.py::TestSystemIntegration::test_deployment_readiness PASSED [ 75%] src/validation/test_deployment.py::TestPerformance::test_gpu_utilization_reasonable PASSED [ 83%] ============================== 12 passed in 15.23s =============================== ``` #### Next Steps After Successful Validation Once all validation checks pass, proceed with: 1. **Load clinical notes data:** See Step 6 below 2. **Vectorize documents:** See Step 6 below 3. **Test vector search:** See Step 7 below 4. **Run RAG queries:** See "Test RAG Query" section ### 4. Test RAG Query ```bash # Run a sample RAG query python src/query/test_rag.py \ --query "What are the patient's chronic conditions?" \ --patient-id "patient-123" ``` ## Detailed Deployment Steps You can run individual scripts for granular control, or use the automated `./scripts/aws/deploy.sh` script. ### Option A: Automated Deployment ```bash # Complete deployment in one command ./scripts/aws/deploy.sh --provision # Or for existing instance export INSTANCE_ID=i-xxxxxxxxxxxxx export PUBLIC_IP=34.xxx.xxx.xxx ./scripts/aws/deploy.sh ``` ### Option B: Step-by-Step Deployment ### Step 1: Provision EC2 Instance ```bash ./scripts/aws/provision-instance.sh ``` **What this does:** - Creates security group with required ports: - 22 (SSH) - 1972 (IRIS SQL) - 52773 (IRIS Management Portal) - 8001 (NIM LLM API) - Launches EC2 g5.xlarge instance with Ubuntu 24.04 LTS - Attaches 500GB gp3 EBS volume - Configures resource tags for tracking - Saves instance info to `.instance-info` file **Expected output:** ``` → Finding Ubuntu 24.04 LTS AMI in us-east-1... ✓ Found AMI: ami-xxxxxxxxxxxxx → Creating security group: fhir-ai-hackathon-sg... ✓ Security group created: sg-xxxxxxxxxxxxx → Launching g5.xlarge instance in us-east-1... ✓ Instance launched: i-xxxxxxxxxxxxx ✓ Instance is now running ========================================== Instance Provisioned Successfully ========================================== Instance ID: i-xxxxxxxxxxxxx Instance Type: g5.xlarge Public IP: 34.xxx.xxx.xxx Region: us-east-1 SSH Key: your-key-name ``` **For remote operations:** ```bash # Provision from your local machine for remote host export SSH_KEY_PATH=~/.ssh/your-key.pem ./scripts/aws/provision-instance.sh ``` ### Step 2: Install GPU Drivers ```bash ./scripts/aws/install-gpu-drivers.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> ``` **What this does:** - Installs NVIDIA driver-535 (LTS) - Installs nvidia-utils-535 - Detects if reboot is required - Automatically reboots and waits for instance to come back online - Verifies GPU is accessible **Expected output:** ``` → Installing NVIDIA drivers on remote host: 34.xxx.xxx.xxx → Updating package list... → Installing NVIDIA driver 535 (LTS)... ✓ NVIDIA drivers installed ✓ nvidia-smi is available ! GPU not yet accessible - reboot required Reboot instance now? (yes/no): yes → Rebooting remote host... → Waiting 60 seconds for instance to reboot... → Waiting for SSH to be available... ✓ Instance is back online → Verifying GPU on remote host... ✓ GPU is accessible ========================================== NVIDIA Driver Installation Complete ========================================== ``` ### Step 3: Setup Docker GPU Runtime ```bash ./scripts/aws/setup-docker-gpu.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> ``` **What this does:** - Installs Docker CE (if not present) - Installs NVIDIA Container Toolkit - Configures Docker daemon for GPU runtime - Restarts Docker service - Verifies GPU accessibility in containers **Expected output:** ``` → Setting up Docker GPU runtime on remote host: 34.xxx.xxx.xxx → Checking for Docker... ✓ Docker is already installed Docker version 27.x.x → Installing NVIDIA Container Toolkit... ✓ NVIDIA Container Toolkit installed → Configuring Docker for GPU... → Restarting Docker... ✓ Docker configured for GPU → Verifying GPU accessibility in containers... ✓ GPU is accessible in containers ========================================== Docker GPU Runtime Setup Complete ========================================== ``` **Validation after reboot:** ```bash ssh -i your-key.pem ubuntu@<public-ip> nvidia-smi ``` Expected: ``` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 535.xxx.xx Driver Version: 535.xxx.xx CUDA Version: 12.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 | | 0% 28C P0 55W / 300W | 0MiB / 23028MiB | 0% Default | +-------------------------------+----------------------+----------------------+ ``` ### Step 4: Deploy IRIS Database ```bash ./scripts/aws/deploy-iris.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> ``` **What this does:** - Pulls InterSystems IRIS Community Edition 2025.1 - Creates Docker volume for persistent storage - Starts IRIS container with ports 1972 (SQL) and 52773 (Management) - Creates DEMO namespace - Creates vector tables: - `ClinicalNoteVectors` with VECTOR(DOUBLE, 1024) - `MedicalImageVectors` with VECTOR(DOUBLE, 1024) - Indexes for efficient patient/document lookups **Expected output:** ``` → Deploying InterSystems IRIS... → Pulling IRIS image... ✓ Image pulled → Creating persistent volume... ✓ Volume created → Starting IRIS container... ✓ IRIS container started → Waiting for IRIS to initialize (30 seconds)... ✓ IRIS is running → Creating namespace and schema... → Creating namespace... ✓ Namespace created → Creating tables... ✓ Schema created ✓ Tables verified ========================================== IRIS Vector Database Deployed ========================================== Connection details: Host: 34.xxx.xxx.xxx SQL Port: 1972 Web Port: 52773 Namespace: DEMO Username: _SYSTEM Password: SYS Tables created: - ClinicalNoteVectors (1024-dim VECTOR) - MedicalImageVectors (1024-dim VECTOR) Management Portal: http://34.xxx.xxx.xxx:52773/csp/sys/UtilHome.csp ``` **Testing the connection:** ```bash # Test with iris Python module python -c "import iris; \ conn = iris.connect('34.xxx.xxx.xxx', 1972, 'DEMO', '_SYSTEM', 'SYS'); \ print('✅ Connected to IRIS')" ``` ### Step 5: Deploy NIM LLM ```bash ./scripts/aws/deploy-nim-llm.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> ``` **What this does:** - Pulls NVIDIA NIM LLM container (meta/llama-3.1-8b-instruct) - Starts LLM container with GPU allocation - Downloads model weights (~8GB, first run only) - Exposes OpenAI-compatible API on port 8001 - Validates service health **Expected output:** ``` → Deploying NVIDIA NIM LLM... Model: meta/llama-3.1-8b-instruct ✓ NVIDIA API key found → Verifying GPU... ✓ GPU accessible → Pulling NIM LLM image... ✓ Image pulled → Starting NIM LLM container... ✓ NIM LLM container started → Waiting for NIM to initialize (checking every 30s)... → Still initializing... (30/600s) → Still initializing... (60/600s) ✓ NIM is initializing ✓ NIM LLM deployed ========================================== NVIDIA NIM LLM Deployed ========================================== Model: meta/llama-3.1-8b-instruct Endpoint: http://34.xxx.xxx.xxx:8001/v1/chat/completions Test with curl: curl -X POST http://34.xxx.xxx.xxx:8001/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta/llama-3.1-8b-instruct", "messages": [{"role": "user", "content": "What is RAG?"}], "max_tokens": 100 }' ``` **Testing the LLM:** ```bash # Test chat completion curl -X POST http://<PUBLIC_IP>:8001/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta/llama-3.1-8b-instruct", "messages": [ {"role": "system", "content": "You are a helpful medical AI assistant."}, {"role": "user", "content": "Explain hypertension in simple terms."} ], "max_tokens": 150, "temperature": 0.7 }' ``` ### Step 5: Create Vector Tables ```bash python src/setup/create_text_vector_table.py ``` **What this does:** - Connects to IRIS database - Creates `ClinicalNoteVectors` table with VECTOR(DOUBLE, 1024) column - Creates indices for efficient search - Validates table schema **Expected output:** ``` ✅ Table created: DEMO.ClinicalNoteVectors ✅ Columns: ResourceID, PatientID, DocumentType, TextContent, Embedding ✅ Vector dimension: 1024 ✅ Similarity metric: COSINE ``` ### Step 6: Vectorize Clinical Notes ```bash python src/vectorization/vectorize_documents.py \ --input synthea_clinical_notes.json \ --batch-size 50 ``` **What this does:** - Reads clinical notes from JSON file - Calls NVIDIA embeddings API in batches - Stores vectors in IRIS database - Tracks progress with SQLite checkpoint - Provides ETA and throughput metrics **Expected output:** ``` 📊 Total documents: 50,127 🔄 Processing in batches of 50... ✅ Batch 1/1003 complete (50 docs, 2.3s, 21.7 docs/sec) ✅ Batch 2/1003 complete (50 docs, 2.1s, 23.8 docs/sec) ... ✅ All documents vectorized! 📈 Total time: 42m 15s 📈 Average throughput: 19.8 docs/sec 📈 Total vectors: 50,127 ``` ### Step 7: Test Vector Search ```bash python src/query/test_vector_search.py \ --query "diabetes treatment history" \ --top-k 10 ``` **Expected output:** ``` 🔍 Query: "diabetes treatment history" 📊 Found 10 results in 0.23s Top result: Similarity: 0.87 Patient: patient-456 Document: Progress Note 2024-01-15 Content: Patient with type 2 diabetes, currently on metformin 1000mg BID... ``` ### Step 8: Vectorize Clinical Notes (Production Pipeline) Once your infrastructure is validated, you can vectorize clinical notes at scale using the production pipeline: ```bash python src/vectorization/text_vectorizer.py \ --input synthea_clinical_notes.json \ --batch-size 50 ``` **What this does:** - Validates all documents for required fields - Preprocesses text (whitespace normalization, truncation) - Generates embeddings in batches (50 docs/batch default) - Stores vectors in IRIS ClinicalNoteVectors table - Tracks progress with SQLite checkpoint for resumability - Logs validation errors to `vectorization_errors.log` **Expected output:** ``` 2025-01-09 14:32:15 - Initializing NVIDIA NIM embeddings client... 2025-01-09 14:32:16 - Initializing IRIS vector database client... 2025-01-09 14:32:17 - ✓ Connected to IRIS: 34.xxx.xxx.xxx:1972/DEMO Starting vectorization pipeline... Input file: synthea_clinical_notes.json Batch size: 50 Resume mode: False ✓ Loaded 50,127 documents Validating and preprocessing documents... ✓ 50,100 valid documents ready for vectorization Processing batch 1/1002 (50 documents) Progress: 1/1002 batches | 50 successful | 0 failed | 21.7 docs/min | ETA: 38.5 min Processing batch 2/1002 (50 documents) Progress: 2/1002 batches | 100 successful | 0 failed | 23.1 docs/min | ETA: 36.2 min ... ================================================================================ Vectorization Summary ================================================================================ Total documents: 50,127 Validation errors: 27 Processed: 50,100 Successful: 50,100 Failed: 0 Elapsed time: 2145.3s (35.8 min) Throughput: 140.1 docs/min ================================================================================ ✅ Vectorization complete! ``` **Command-line options:** ```bash # Resume from checkpoint (skip already processed documents) python src/vectorization/text_vectorizer.py \ --input synthea_clinical_notes.json \ --resume # Test vector search after vectorization python src/vectorization/text_vectorizer.py \ --input synthea_clinical_notes.json \ --test-search "diabetes medication" # Adjust batch size for API rate limits python src/vectorization/text_vectorizer.py \ --input synthea_clinical_notes.json \ --batch-size 25 # Custom checkpoint and error log paths python src/vectorization/text_vectorizer.py \ --input synthea_clinical_notes.json \ --checkpoint-db my_state.db \ --error-log my_errors.log ``` **Performance expectations:** | Dataset Size | Batch Size | Expected Throughput | Total Time (est.) | |--------------|------------|---------------------|-------------------| | 1,000 docs | 50 | ≥100 docs/min | ~10 minutes | | 10,000 docs | 50 | ≥100 docs/min | ~100 minutes | | 50,000 docs | 50 | ≥100 docs/min | ~500 minutes | | 100,000 docs | 50 | ≥100 docs/min | ~1000 minutes | **Note:** Throughput depends on: - Network latency to NVIDIA API - NVIDIA API rate limits (60 req/min for free tier) - IRIS database write performance - Instance network bandwidth **Progress tracking:** The pipeline provides real-time progress updates: - **Batch X/Y**: Current batch number and total batches - **Successful**: Number of successfully vectorized documents - **Failed**: Number of failed documents (check error log) - **docs/min**: Current throughput rate - **ETA**: Estimated time remaining **Resumability:** The pipeline uses SQLite checkpoint tracking, so you can safely interrupt (Ctrl+C) and resume: ```bash # Start vectorization python src/vectorization/text_vectorizer.py --input data.json --batch-size 50 # ... Interrupt with Ctrl+C after processing 5,000 documents ... # Resume from checkpoint (skips already processed 5,000) python src/vectorization/text_vectorizer.py --input data.json --resume ``` **Validation errors:** Documents that fail validation are logged to `vectorization_errors.log` and skipped: ``` ================================================================================ Validation Errors - 2025-01-09T14:32:45.123456 ================================================================================ Resource ID: doc-broken-123 Error: Missing required field: text_content -------------------------------------------------------------------------------- Resource ID: doc-empty-456 Error: Empty text_content -------------------------------------------------------------------------------- ``` Common validation failures: - Missing required fields (resource_id, patient_id, document_type, text_content) - Empty or whitespace-only text_content - Invalid JSON structure **Testing search after vectorization:** ```bash # Test search with default query python src/vectorization/text_vectorizer.py \ --input synthea_clinical_notes.json \ --test-search # Test with custom query python src/vectorization/text_vectorizer.py \ --input synthea_clinical_notes.json \ --test-search "hypertension treatment" ``` **Output:** ``` Testing vector search: 'hypertension treatment' Top 3 results: 1. Similarity: 0.892 Patient ID: patient-789 Doc Type: Progress Note Content: Patient with essential hypertension on amlodipine 5mg daily... 2. Similarity: 0.856 Patient ID: patient-123 Doc Type: History and physical note Content: Hypertension managed with lifestyle modifications and ACE inhibitor... 3. Similarity: 0.824 Patient ID: patient-456 Doc Type: Encounter note Content: Blood pressure elevated, discussed medication compliance... ``` ### Step 9: Query Clinical Notes with RAG Pipeline Once clinical notes are vectorized, you can run natural language queries using the RAG (Retrieval-Augmented Generation) pipeline. ```bash python src/validation/test_rag_query.py \ --query "What are the patient's chronic conditions?" ``` **What this does:** - Generates embedding for your natural language query - Searches IRIS vector database for semantically similar clinical notes - Retrieves top-k most relevant documents - Assembles context from retrieved documents - Sends context + query to NVIDIA NIM LLM (meta/llama-3.1-8b-instruct) - Generates natural language response citing source documents - Extracts and formats citations with similarity scores **Expected output:** ``` ================================================================================ RAG Query Test ================================================================================ Query: "What are the patient's chronic conditions?" Top-K: 10 Similarity Threshold: 0.5 ================================================================================ Response: -------------------------------------------------------------------------------- Based on the clinical notes, the patient has the following chronic conditions: 1. Type 2 diabetes mellitus (mentioned in Document 1 and Document 3) 2. Essential hypertension (mentioned in Document 2) 3. Hyperlipidemia (mentioned in Document 1) The patient is currently managing these conditions with: - Metformin 1000mg BID for diabetes - Amlodipine 5mg daily for hypertension - Atorvastatin 20mg daily for hyperlipidemia ================================================================================ Retrieved Documents (3 used in context, 5 total retrieved) ================================================================================ [1] Similarity: 0.87 | Patient: patient-789 | Type: Progress Note Resource ID: f1a10b20-dbaa-2a6b-d46f-11223d8ac3f0 Content: "Patient with type 2 diabetes, currently on metformin..." ✓ Cited in response [2] Similarity: 0.82 | Patient: patient-789 | Type: History and physical Resource ID: doc-456-2023-12-10 Content: "Patient presents with hypertension, well-controlled..." ✓ Cited in response [3] Similarity: 0.79 | Patient: patient-789 | Type: Progress Note Resource ID: doc-123-2024-01-05 Content: "Patient with hyperlipidemia and type 2 diabetes..." ✓ Cited in response Additional 2 documents retrieved but not used in context: [4] Similarity: 0.68 | Patient: patient-789 | Type: Encounter note [5] Similarity: 0.61 | Patient: patient-789 | Type: Progress Note ================================================================================ Metadata ================================================================================ Processing Time: 3.45 seconds Documents Retrieved: 5 Documents Used in Context: 3 Citations Found: 3 Performance: ✅ Meets SC-007 target (<5s) Timestamp: 2025-01-09T15:30:45.123456 ================================================================================ ``` **Query with patient filter:** ```bash python src/validation/test_rag_query.py \ --query "What medications is the patient taking?" \ --patient-id "patient-789" ``` **Query with document type filter:** ```bash python src/validation/test_rag_query.py \ --query "Recent lab results and vital signs" \ --document-type "Progress Note" ``` **Advanced query parameters:** ```bash python src/validation/test_rag_query.py \ --query "Patient's medication history and allergies" \ --top-k 15 \ --similarity-threshold 0.6 \ --max-context-tokens 5000 \ --llm-max-tokens 1000 \ --llm-temperature 0.5 \ --output result.json ``` **Command-line options:** | Option | Description | Default | |--------|-------------|---------| | `--query` | Natural language query (required) | - | | `--patient-id` | Filter results by patient ID | None | | `--document-type` | Filter by document type | None | | `--top-k` | Number of documents to retrieve | 10 | | `--similarity-threshold` | Minimum similarity score (0-1) | 0.5 | | `--max-context-tokens` | Max tokens for context | 4000 | | `--llm-max-tokens` | Max tokens in LLM response | 500 | | `--llm-temperature` | LLM sampling temperature (0-1) | 0.7 | | `--output` | Save result to JSON file | None | | `--show-full-documents` | Show full document text | False | | `-v, --verbose` | Enable verbose logging | False | **Performance expectations:** | Component | Target Latency | Notes | |-----------|---------------|-------| | Query embedding | <1s | NVIDIA NIM embeddings API | | Vector search | <1s | IRIS COSINE similarity search | | Context retrieval | <0.5s | Database query | | LLM generation | <3s | NIM LLM (meta/llama-3.1-8b-instruct) | | **Total (SC-007)** | **<5s** | End-to-end query processing | **Example queries:** ```bash # General medical query python src/validation/test_rag_query.py \ --query "What are the patient's vital signs trends over time?" # Specific condition query python src/validation/test_rag_query.py \ --query "Has the patient been diagnosed with diabetes?" # Medication query python src/validation/test_rag_query.py \ --query "What dosage of metformin is the patient taking?" # Treatment history query python src/validation/test_rag_query.py \ --query "What treatments have been prescribed for hypertension?" # Recent activity query python src/validation/test_rag_query.py \ --query "What were the findings from the patient's last visit?" ``` **Python API usage:** You can also use the RAG pipeline directly in Python code: ```python from query.rag_pipeline import RAGPipeline # Initialize pipeline pipeline = RAGPipeline() # Process query result = pipeline.process_query( query_text="What are the patient's chronic conditions?", top_k=10, patient_id="patient-789", # Optional filter similarity_threshold=0.5 ) # Access results print(f"Response: {result['response']}") print(f"Retrieved: {result['metadata']['documents_retrieved']} documents") print(f"Processing time: {result['metadata']['processing_time_seconds']}s") # Iterate through citations for citation in result['citations']: if citation['cited_in_response']: print(f" - {citation['resource_id']} (similarity: {citation['similarity']:.3f})") ``` **Integration testing:** Run the end-to-end RAG test suite: ```bash pytest tests/integration/test_end_to_end_rag.py -v ``` Expected output: ``` ============================== test session starts =============================== tests/integration/test_end_to_end_rag.py::TestRAGPipelineBasics::test_pipeline_initialization PASSED tests/integration/test_end_to_end_rag.py::TestRAGPipelineBasics::test_query_embedding_generation PASSED tests/integration/test_end_to_end_rag.py::TestRAGQueryProcessing::test_process_simple_query PASSED tests/integration/test_end_to_end_rag.py::TestRAGQueryProcessing::test_citation_extraction PASSED tests/integration/test_end_to_end_rag.py::TestPerformance::test_query_latency_meets_sc007 PASSED ... ============================== 15 passed in 45.23s =============================== ``` ### Step 10: Deploy NIM Vision Service (Optional - for Image Vectorization) For multi-modal RAG capabilities with medical images (chest X-rays, CT scans, etc.), deploy the NVIDIA NIM Vision service. ```bash ./scripts/aws/deploy-nim-vision.sh --remote <PUBLIC_IP> --ssh-key <PATH_TO_KEY> ``` **What this does:** - Pulls NVIDIA NIM Vision container (nv-clip-vit model) - Starts Vision service with GPU allocation on port 8002 - Downloads CLIP Vision Transformer model (~2GB, first run only) - Exposes image embedding API compatible with src/vectorization/image_vectorizer.py - Validates service health **Expected output:** ``` ╔══════════════════════════════════════════════════════════════╗ ║ NVIDIA NIM Vision Deployment ║ ╚══════════════════════════════════════════════════════════════╝ → Checking NVIDIA API key... ✓ NVIDIA API key found → Checking GPU availability... ✓ GPU accessible → Checking for existing container... → Pulling NVIDIA NIM Vision image... Image: nvcr.io/nim/nvidia/nv-clip-vit:latest This may take several minutes... ✓ Image pulled → Starting NVIDIA NIM Vision container... ✓ Container started Container name: nim-vision Port mapping: 8002:8000 → Waiting for NIM Vision to initialize... This may take 3-5 minutes (model download on first run) Still initializing... (30s/300s) Still initializing... (60s/300s) ✓ NIM Vision service is healthy ========================================== NVIDIA NIM Vision Deployed ========================================== Model: CLIP Vision Transformer Endpoint: http://34.xxx.xxx.xxx:8002 Health: http://34.xxx.xxx.xxx:8002/health Container Details: Name: nim-vision Port: 8002 (external) → 8000 (internal) GPU: Enabled Shared Memory: 8g Test with curl: curl -X POST http://34.xxx.xxx.xxx:8002/v1/embeddings \ -H "Content-Type: application/json" \ -d '{ "input": "base64_encoded_image_here", "model": "nv-clip-vit" }' ✅ NIM Vision deployment complete! ``` **Testing the Vision service:** ```bash # Test health endpoint curl http://<PUBLIC_IP>:8002/health # Should return: {"status": "ready"} # The vision service accepts base64-encoded images # See image_vectorizer.py for usage examples ``` ### Step 11: Vectorize Medical Images (Optional - Multi-Modal RAG) Once NIM Vision is deployed, vectorize medical images for visual similarity search and multi-modal RAG queries. **Prerequisites:** - NIM Vision service running on port 8002 - IRIS MedicalImageVectors table created (automatically handled by image_vectorizer.py) - Medical images in supported formats (DICOM, PNG, JPG) #### Using MIMIC-CXR Chest X-Rays The kit includes integration with the MIMIC-CXR dataset (19,091 DICOM chest X-rays): ```bash python src/vectorization/image_vectorizer.py \ --input /path/to/mimic-cxr/files \ --format dicom \ --batch-size 10 ``` **Expected output:** ``` 2025-01-09 19:30:00 - INFO - Initializing components... 2025-01-09 19:30:01 - INFO - NIM Vision client initialized: http://localhost:8002 2025-01-09 19:30:02 - INFO - ✓ Connected to IRIS: localhost:1972/DEMO 2025-01-09 19:30:02 - INFO - Checkpoint database initialized: image_vectorization_state.db =============================================================================== Medical Image Vectorization Pipeline =============================================================================== Input directory: /path/to/mimic-cxr/files Image formats: dicom Batch size: 10 Resume mode: False =============================================================================== ✓ Discovered 19,091 image files Validating 19,091 images... ✓ 19,091 valid images, 0 validation errors Processing 19,091 images in 1,910 batches... Batch 1/1910: 10 successful, 0 failed | 8.2s | 1.22 imgs/sec | ETA: 4.3 hours Batch 2/1910: 10 successful, 0 failed | 7.9s | 1.27 imgs/sec | ETA: 4.1 hours Batch 3/1910: 10 successful, 0 failed | 8.1s | 1.23 imgs/sec | ETA: 4.2 hours ... ================================================================================ Vectorization Summary ================================================================================ Total images discovered: 19,091 Validation errors: 0 Valid images: 19,091 Successfully processed: 19,091 Failed: 0 Elapsed time: 15,420.5s (4.3 hours) Throughput: 1.24 images/sec ================================================================================ ✅ Performance target met: 1.24 imgs/sec ≥ 0.5 imgs/sec ``` **Performance expectations:** | Dataset Size | Batch Size | Expected Throughput | Total Time (est.) | |--------------|------------|---------------------|-------------------| | 100 images | 10 | ≥0.5 imgs/sec | ~3 minutes | | 1,000 images | 10 | ≥0.5 imgs/sec | ~30 minutes | | 10,000 images| 10 | ≥0.5 imgs/sec | ~5 hours | | 19,091 images| 10 | ≥0.5 imgs/sec | ~10 hours | **Note:** Performance target SC-005 specifies ≥0.5 images/second (<2 sec/image). Actual throughput depends on: - GPU model (A10G recommended) - Network latency to NIM Vision API - IRIS database write performance - Image preprocessing complexity (DICOM conversion, normalization, resizing) #### Command-line Options ```bash # Resume from checkpoint (skip already processed images) python src/vectorization/image_vectorizer.py \ --input /path/to/images \ --format dicom \ --resume # Process PNG/JPG images python src/vectorization/image_vectorizer.py \ --input /path/to/images \ --format png,jpg \ --batch-size 10 # Test visual similarity search after vectorization python src/vectorization/image_vectorizer.py \ --input /path/to/images \ --format dicom \ --test-search /path/to/query-image.dcm # Custom NIM Vision endpoint python src/vectorization/image_vectorizer.py \ --input /path/to/images \ --format dicom \ --vision-url http://34.xxx.xxx.xxx:8002 # Custom checkpoint and error log paths python src/vectorization/image_vectorizer.py \ --input /path/to/images \ --format dicom \ --checkpoint-db my_image_state.db \ --error-log my_image_errors.log ``` #### Visual Similarity Search After vectorizing images, test visual similarity search: ```bash python src/vectorization/image_vectorizer.py \ --input /path/to/images \ --format dicom \ --test-search /path/to/query-image.dcm \ --top-k 10 ``` **Expected output:** ``` ================================================================================ Visual Similarity Search Test ================================================================================ Query image: /path/to/query-image.dcm Top-K: 10 Generating embedding for query image... ✓ Generated 1024-dimensional embedding Searching for top-10 similar images... ✓ Found 10 results Similar Images: -------------------------------------------------------------------------------- [1] Similarity: 0.9234 Image ID: 4b369dbe-417168fa-7e2b5f04-00582488-c50504e7 Patient: p10045779 Study Type: Chest X-Ray Path: /path/to/images/p10/p10045779/s53819164/4b369dbe-417168fa-7e2b5f04-00582488-c50504e7.dcm [2] Similarity: 0.9187 Image ID: 48b7ea9c-c1610133-64303c6f-4f6dfe6c-805036e8 Patient: p10433353 Study Type: Chest X-Ray Path: /path/to/images/p10/p10433353/s50527707/48b7ea9c-c1610133-64303c6f-4f6dfe6c-805036e8.dcm [3] Similarity: 0.9102 Image ID: 8640649e-a6a3ae17-6f9c2091-560aef6e-9c1f19c7 Patient: p10179495 Study Type: Chest X-Ray Path: /path/to/images/p10/p10179495/s57176651/8640649e-a6a3ae17-6f9c2091-560aef6e-9c1f19c7.dcm ... ================================================================================ ``` #### DICOM Metadata Support The image vectorizer automatically extracts DICOM metadata: - **PatientID**: De-identified patient identifier - **StudyDescription**: Description of imaging study - **Modality**: Imaging modality (DX, CR, CT, MR, etc.) - **Rows/Columns**: Image dimensions - **PixelData**: Raw image array (normalized and preprocessed) Example DICOM extraction: ```python from vectorization.image_vectorizer import ImageValidator from pathlib import Path validator = ImageValidator(dicom_enabled=True) is_valid, metadata, error = validator.validate_and_extract(Path("/path/to/image.dcm")) if is_valid: print(f"Patient: {metadata.patient_id}") print(f"Study: {metadata.study_type}") print(f"Dimensions: {metadata.width}x{metadata.height}") ``` #### Resumability and Error Handling The image vectorization pipeline supports checkpoint-based resumability: **Checkpoint tracking:** - SQLite database stores image processing state (pending, processing, completed, failed) - Safe interruption with Ctrl+C - resume from where you left off - Automatic retry of failed images in subsequent runs **Resume from checkpoint:** ```bash # Start vectorization python src/vectorization/image_vectorizer.py --input /path/to/images --format dicom # ... Interrupt with Ctrl+C after processing 1,000 images ... # Resume (skips already processed 1,000 images) python src/vectorization/image_vectorizer.py --input /path/to/images --format dicom --resume ``` **Error logging:** Validation and processing errors are logged to `image_vectorization_errors.log`: ``` ================================================================================ Error - 2025-01-09T19:35:12.345678 ================================================================================ Image ID: corrupt-image-001 Error: Validation failed: DICOM file is corrupted or incomplete -------------------------------------------------------------------------------- Image ID: invalid-dimensions-002 Error: Image dimensions invalid: 0x0 -------------------------------------------------------------------------------- ``` Common errors: - Corrupted DICOM files - Invalid image dimensions (0x0) - Unsupported DICOM transfer syntax - Permission errors reading files #### Integration Testing Run integration tests to verify the pipeline: ```bash pytest tests/integration/test_image_vectorization.py -v ``` **Expected output:** ``` ============================== test session starts =============================== tests/integration/test_image_vectorization.py::TestDICOMValidation::test_dicom_format_detection PASSED tests/integration/test_image_vectorization.py::TestDICOMValidation::test_dicom_metadata_extraction PASSED tests/integration/test_image_vectorization.py::TestImagePreprocessing::test_dicom_to_pil_conversion PASSED tests/integration/test_image_vectorization.py::TestImagePreprocessing::test_image_resizing PASSED tests/integration/test_image_vectorization.py::TestNIMVisionAPI::test_embedding_generation_mock PASSED tests/integration/test_image_vectorization.py::TestCheckpointManagement::test_checkpoint_initialization PASSED tests/integration/test_image_vectorization.py::TestEndToEndPipeline::test_pipeline_initialization PASSED tests/integration/test_image_vectorization.py::TestPerformanceValidation::test_preprocessing_performance PASSED ============================== 15 passed in 8.23s =============================== ``` ## Configuration Options ### Environment Variables Edit `.env` to customize deployment: ```bash # AWS Configuration AWS_REGION=us-east-1 # AWS region AWS_INSTANCE_TYPE=g5.xlarge # Instance type SSH_KEY_NAME=my-key # SSH key pair name SSH_KEY_PATH=~/.ssh/my-key.pem # Path to private key # NVIDIA API Keys NVIDIA_API_KEY=nvapi-xxx # NVIDIA NGC API key NGC_API_KEY=nvapi-xxx # Same as NVIDIA_API_KEY # IRIS Database IRIS_USERNAME=_SYSTEM # Database username IRIS_PASSWORD=ISCDEMO # Database password IRIS_HOST=localhost # Host (localhost for same instance) IRIS_PORT=1972 # SQL port IRIS_NAMESPACE=DEMO # Namespace for tables # Optional: Performance Tuning BATCH_SIZE=50 # Embedding batch size EMBEDDING_MODEL=nvidia/nv-embedqa-e5-v5 # Embedding model LLM_MODEL=meta/llama-3.1-8b-instruct # LLM model ``` ### AWS Configuration Edit `config/aws-config.yaml` to customize infrastructure: ```yaml instance: type: g5.xlarge # Change to g5.2xlarge for more GPU memory region: us-east-1 # Change region availability_zone: us-east-1a ebs_volume: size: 500 # Increase for more data type: gp3 iops: 3000 ``` ### NIM Configuration Edit `config/nim-config.yaml` to customize AI services: ```yaml nim_llm: model: meta/llama-3.1-8b-instruct # Change to larger model port: 8001 shared_memory: 16g # Increase for larger models nim_embeddings: batch_size: 50 # Increase for faster vectorization rate_limit: requests_per_minute: 60 # Adjust based on API tier ``` ## Next Steps After successful deployment: 1. **Load your own data:** See [docs/data-ingestion.md](data-ingestion.md) 2. **Customize RAG pipeline:** See [docs/rag-customization.md](rag-customization.md) 3. **Monitor performance:** See [docs/monitoring.md](monitoring.md) 4. **Scale the system:** See [docs/scaling.md](scaling.md) ## Support - **Issues:** Report at [GitHub Issues](https://github.com/your-org/FHIR-AI-Hackathon-Kit/issues) - **Documentation:** See [docs/](../docs/) - **Troubleshooting:** See [docs/troubleshooting.md](troubleshooting.md)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server