Skip to main content
Glama
quickstart.md7.18 kB
# Quick Start: AWS GPU NIM RAG Deployment **Feature**: 003-aws-nim-deployment **Estimated Time**: 30 minutes **Prerequisites**: AWS account, NVIDIA NGC API key, local machine with AWS CLI configured --- ## Prerequisites Checklist Before starting deployment: - [ ] AWS account with EC2 permissions - [ ] AWS CLI v2 installed and configured (`aws configure`) - [ ] NVIDIA NGC account (https://ngc.nvidia.com) - [ ] NVIDIA API key (from NGC dashboard) - [ ] SSH key pair for EC2 access (`.pem` file) - [ ] Clinical notes dataset (synthea_clinical_notes.json) - [ ] Git repository cloned locally --- ## Step 1: Configure Environment (2 minutes) Create configuration file with your credentials: ```bash # Copy template and edit with your values cp config/.env.template config/.env # Edit config/.env with your credentials nano config/.env ``` Required environment variables: ```bash AWS_REGION=us-east-1 AWS_INSTANCE_TYPE=g5.xlarge SSH_KEY_NAME=your-key-name SSH_KEY_PATH=/path/to/your-key.pem NVIDIA_API_KEY=nvapi-xxxxx... NGC_API_KEY=nvapi-xxxxx... # Usually same as NVIDIA_API_KEY IRIS_USERNAME=_SYSTEM IRIS_PASSWORD=ISCDEMO ``` --- ## Step 2: Run Main Deployment Script (25 minutes) Execute the all-in-one deployment automation: ```bash # Make script executable chmod +x scripts/aws/deploy.sh # Run deployment (monitors progress automatically) ./scripts/aws/deploy.sh ``` **What happens automatically**: 1. ✅ Provision g5.xlarge EC2 instance (2 min) 2. ✅ Install NVIDIA drivers + CUDA 12.2 (5 min) 3. ✅ Configure Docker GPU runtime (2 min) 4. ✅ Deploy IRIS vector database (3 min) 5. ✅ Deploy NVIDIA NIM LLM service (10 min - large download) 6. ✅ Run deployment validation tests (3 min) **Expected Output**: ``` [2025-11-09 12:34:56] Starting deployment... [2025-11-09 12:36:30] ✅ EC2 instance launched: i-012abe9cf48fdc702 [2025-11-09 12:41:15] ✅ GPU drivers installed: nvidia-driver-535 [2025-11-09 12:43:00] ✅ Docker GPU runtime configured [2025-11-09 12:46:12] ✅ IRIS database running on port 1972 [2025-11-09 12:56:45] ✅ NIM LLM service ready on port 8001 [2025-11-09 12:59:30] ✅ All validation tests passed Deployment Summary: Instance IP: 34.238.176.10 GPU: NVIDIA A10G (24GB) Services Running: IRIS (1972), NIM-LLM (8001) Status: READY Next steps: Run vectorization pipeline (see Step 3) ``` --- ## Step 3: Verify Deployment (1 minute) Check that all services are healthy: ```bash # SSH into instance and run health checks ssh -i $SSH_KEY_PATH ubuntu@<INSTANCE_IP> # On remote instance: ./fhir-ai-hackathon/scripts/aws/validate-deployment.sh ``` **Expected Output**: ``` Running deployment validation... ✅ GPU Check: NVIDIA A10G detected (Driver 535.274.02) ✅ Docker GPU: nvidia-smi accessible in containers ✅ IRIS Database: Connection successful (port 1972) ✅ IRIS Tables: ClinicalNoteVectors table exists ✅ NIM LLM Service: Health check passed (port 8001) ✅ NIM LLM Inference: Test query successful (2.3s response time) All validation checks passed! System ready for vectorization. ``` --- ## Step 4: Vectorize Clinical Notes (30-50 minutes) Process your clinical documents dataset: ```bash # On remote instance (via SSH) cd ~/fhir-ai-hackathon # Copy your data file (if not already present) # From local machine: scp -i $SSH_KEY_PATH synthea_clinical_notes.json ubuntu@<INSTANCE_IP>:~/fhir-ai-hackathon/ # On remote instance: Run vectorization python3 src/vectorization/text_vectorizer.py \ --input synthea_clinical_notes.json \ --batch-size 50 \ --resume # Enables resumable processing ``` **Progress Output**: ``` Vectorization Pipeline Starting... Total documents: 50,569 Batch size: 50 Resume mode: ENABLED Processing batch 1/1012... [=====> ] 50/50 docs (2.1s) Processing batch 2/1012... [=====> ] 50/50 docs (2.0s) ... Processing batch 1012/1012... [=====> ] 19/50 docs (0.8s) Vectorization Complete! Total documents: 50,569 Successfully processed: 50,565 (99.99%) Failed: 4 (see vectorization_errors.log) Total time: 42 minutes Throughput: 120 docs/min ``` --- ## Step 5: Test RAG Query (30 seconds) Verify end-to-end RAG functionality: ```bash # On remote instance python3 src/validation/test_rag_query.py \ --query "What are the common symptoms of diabetes?" ``` **Expected Output**: ``` Running RAG Query Test... Query: "What are the common symptoms of diabetes?" Retrieved Documents (top 3): 1. [Patient: patient-456, Score: 0.89] "Patient presents with polyuria, polydipsia, and unexplained weight loss..." 2. [Patient: patient-123, Score: 0.85] "History of frequent urination, excessive thirst, and fatigue..." 3. [Patient: patient-789, Score: 0.82] "Chief complaint: increased hunger and blurred vision..." Generated Response (NIM LLM): "Based on the retrieved clinical notes, common symptoms of diabetes include: - Polyuria (frequent urination) - Polydipsia (excessive thirst) - Unexplained weight loss - Fatigue - Increased hunger - Blurred vision Sources: Patient records patient-456, patient-123, patient-789" Query completed in 4.2 seconds. ✅ RAG query test PASSED ``` --- ## Common Issues & Solutions ### Issue: "Permission denied" when running deploy.sh ```bash # Solution: Make script executable chmod +x scripts/aws/deploy.sh ``` ### Issue: "NVIDIA driver not loading" after reboot ```bash # Solution: System reboot required after driver install sudo reboot # Wait 2 minutes, then re-SSH and verify nvidia-smi ``` ### Issue: "NIM LLM container failed to start" ```bash # Check NGC API key is set echo $NVIDIA_API_KEY # View container logs docker logs nim-llm # Common fix: Restart container with correct API key docker rm -f nim-llm # Re-run deploy-nim-llm.sh script ``` ### Issue: "Out of GPU memory during vectorization" ```bash # Solution: Reduce batch size python3 src/vectorization/text_vectorizer.py \ --batch-size 25 # Reduced from default 50 ``` --- ## Next Steps After successful deployment and vectorization: 1. **Add Medical Images**: Run image vectorization pipeline ```bash python3 src/vectorization/image_vectorizer.py --input /path/to/mimic-cxr ``` 2. **Build RAG Application**: Integrate with your application using IRIS and NIM APIs 3. **Monitor System**: Set up CloudWatch alarms for GPU utilization and service health 4. **Scale Up**: For production, consider g5.2xlarge or multi-instance deployment --- ## Cost Estimate **Hourly Costs** (us-east-1): - g5.xlarge instance: $1.006/hour - EBS gp3 storage (500GB): $0.08/month (~$0.00011/hour) - Data transfer: Negligible for testing **Monthly Cost** (24/7 operation): - Instance: ~$730/month - Storage: ~$80/month - **Total**: ~$810/month **Cost Savings**: - Stop instance when not in use: $0/hour (only pay for EBS) - Use Spot Instances: ~60-70% discount on instance cost --- ## Support & Documentation - Full deployment guide: `docs/deployment-guide.md` - Troubleshooting: `docs/troubleshooting.md` - Architecture: `docs/architecture.md` - NVIDIA NIM docs: https://docs.nvidia.com/nim/ - IRIS Vector Search: https://docs.intersystems.com/irislatest/csp/docbook/Doc.View.cls?KEY=GSQL_vecsearch

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server