Skip to main content
Glama
openai-to-nim-migration.md14.5 kB
# OpenAI → NIM Migration Strategy ## Strategy Overview **Phase 1: Development with OpenAI** (Now - 1 week) - Fast iteration and testing - Cheap ($0.0001/1K tokens) - Validate architecture works - Build and test all features **Phase 2: Production Demo with NIM** (Week 2) - Deploy NIM on AWS EC2 with GPU - Swap embeddings backend (minimal code changes) - Demonstrate private, on-premise capability - Stress testing with self-hosted NIM **Phase 3: Cost Optimization** (Ongoing) - Auto start/stop EC2 when not in use - Save ~$500/month vs 24/7 operation - Keep OpenAI for dev, NIM for demos --- ## Architecture: Pluggable Embeddings ### Abstract Interface (Swap Providers Easily) ```python # src/embeddings/base_embeddings.py from abc import ABC, abstractmethod from typing import List class BaseEmbeddings(ABC): """Abstract base class for embeddings providers.""" @abstractmethod def embed_query(self, text: str) -> List[float]: """Embed a single query.""" pass @abstractmethod def embed_documents(self, texts: List[str]) -> List[List[float]]: """Embed multiple documents.""" pass @property @abstractmethod def dimension(self) -> int: """Embedding dimension.""" pass @property @abstractmethod def provider(self) -> str: """Provider name (openai, nim, etc).""" pass ``` --- ## Implementation 1: OpenAI (Development) ### OpenAI Embeddings Adapter ```python # src/embeddings/openai_embeddings.py from openai import OpenAI import os from typing import List from .base_embeddings import BaseEmbeddings class OpenAIEmbeddings(BaseEmbeddings): """OpenAI embeddings adapter.""" def __init__(self, model: str = "text-embedding-3-large"): """ Initialize OpenAI embeddings. Args: model: OpenAI embedding model - text-embedding-3-small (1536-dim, $0.00002/1K tokens) - text-embedding-3-large (3072-dim, $0.00013/1K tokens) """ api_key = os.environ.get('OPENAI_API_KEY') if not api_key: raise ValueError("OPENAI_API_KEY environment variable not set") self.client = OpenAI(api_key=api_key) self.model_name = model # Dimension based on model self._dimension = 3072 if "large" in model else 1536 def embed_query(self, text: str) -> List[float]: """Embed a single query.""" response = self.client.embeddings.create( input=text, model=self.model_name ) return response.data[0].embedding def embed_documents(self, texts: List[str]) -> List[List[float]]: """Embed multiple documents (batch).""" response = self.client.embeddings.create( input=texts, model=self.model_name ) return [item.embedding for item in response.data] @property def dimension(self) -> int: return self._dimension @property def provider(self) -> str: return "openai" ``` ### Setup Instructions ```bash # Install OpenAI SDK pip install openai # Set API key export OPENAI_API_KEY="sk-..." # Test python -c " from src.embeddings.openai_embeddings import OpenAIEmbeddings embedder = OpenAIEmbeddings(model='text-embedding-3-large') vector = embedder.embed_query('chest pain and shortness of breath') print(f'✅ OpenAI embeddings: {len(vector)} dimensions') " ``` ### Cost - **text-embedding-3-large**: $0.00013 per 1K tokens - **51 documents × 100 tokens avg**: 5,100 tokens = $0.0007 (~$0.001) - **10K documents × 100 tokens avg**: 1M tokens = $0.13 - **Queries (1K/day × 30 days)**: 30K tokens = $0.004/month **Total for development: ~$1-5/month** --- ## Implementation 2: NIM (Production Demo) ### NIM Embeddings Adapter ```python # src/embeddings/nim_embeddings.py import requests import os from typing import List from .base_embeddings import BaseEmbeddings class NIMEmbeddings(BaseEmbeddings): """Self-hosted NIM embeddings adapter.""" def __init__(self, endpoint: str = "http://localhost:8000/v1/embeddings", model: str = "nvidia/nv-embedqa-e5-v5"): """ Initialize NIM embeddings. Args: endpoint: NIM inference endpoint URL model: NIM model name (for metadata) """ self.endpoint = endpoint self.model_name = model # Test connection self._test_connection() def _test_connection(self): """Test NIM endpoint is reachable.""" try: response = requests.get( self.endpoint.replace('/v1/embeddings', '/health'), timeout=5 ) if response.status_code != 200: raise ConnectionError(f"NIM health check failed: {response.status_code}") except requests.exceptions.RequestException as e: raise ConnectionError(f"Cannot reach NIM endpoint {self.endpoint}: {e}") def embed_query(self, text: str) -> List[float]: """Embed a single query.""" response = requests.post( self.endpoint, json={ "input": text, "model": self.model_name, "input_type": "query" }, headers={"Content-Type": "application/json"}, timeout=30 ) response.raise_for_status() return response.json()["data"][0]["embedding"] def embed_documents(self, texts: List[str]) -> List[List[float]]: """Embed multiple documents (batch).""" response = requests.post( self.endpoint, json={ "input": texts, "model": self.model_name, "input_type": "passage" }, headers={"Content-Type": "application/json"}, timeout=60 ) response.raise_for_status() data = response.json()["data"] return [item["embedding"] for item in sorted(data, key=lambda x: x["index"])] @property def dimension(self) -> int: # NV-EmbedQA-E5-v5 is 1024-dim return 1024 @property def provider(self) -> str: return "nim" ``` --- ## Factory Pattern (Easy Switching) ```python # src/embeddings/embeddings_factory.py import os from typing import Optional from .base_embeddings import BaseEmbeddings from .openai_embeddings import OpenAIEmbeddings from .nim_embeddings import NIMEmbeddings class EmbeddingsFactory: """Factory for creating embeddings providers.""" @staticmethod def create(provider: Optional[str] = None) -> BaseEmbeddings: """ Create embeddings provider. Args: provider: 'openai', 'nim', or None (auto-detect) Returns: BaseEmbeddings instance Environment Variables: EMBEDDINGS_PROVIDER: 'openai' or 'nim' OPENAI_API_KEY: Required for OpenAI NIM_ENDPOINT: Required for NIM (default: http://localhost:8000/v1/embeddings) """ # Auto-detect from environment if provider is None: provider = os.environ.get('EMBEDDINGS_PROVIDER', 'openai') if provider == 'openai': return OpenAIEmbeddings(model='text-embedding-3-large') elif provider == 'nim': endpoint = os.environ.get('NIM_ENDPOINT', 'http://localhost:8000/v1/embeddings') return NIMEmbeddings(endpoint=endpoint) else: raise ValueError(f"Unknown provider: {provider}") ``` ### Usage in Application ```python # src/setup/vectorize_documents.py from src.embeddings.embeddings_factory import EmbeddingsFactory import iris import json def vectorize_all_documents(): """Vectorize all DocumentReference resources.""" # Create embeddings provider (auto-detect from env) embedder = EmbeddingsFactory.create() print(f"Using embeddings provider: {embedder.provider}") print(f"Embedding dimension: {embedder.dimension}") # Connect to IRIS conn = iris.connect('localhost', 32782, 'DEMO', '_SYSTEM', 'ISCDEMO') cursor = conn.cursor() # Get all DocumentReference resources cursor.execute(""" SELECT ID, ResourceString FROM HSFHIR_X0001_R.Rsrc WHERE ResourceType = 'DocumentReference' AND (Deleted = 0 OR Deleted IS NULL) """) documents = cursor.fetchall() print(f"Found {len(documents)} DocumentReference resources") for resource_id, resource_string in documents: # Parse FHIR JSON and decode clinical note fhir_data = json.loads(resource_string) try: hex_data = fhir_data['content'][0]['attachment']['data'] clinical_note = bytes.fromhex(hex_data).decode('utf-8', errors='replace') except: print(f" Skipping resource {resource_id}: No clinical note") continue # Generate embedding print(f" Vectorizing resource {resource_id}...") vector = embedder.embed_query(clinical_note) # Insert into database cursor.execute(""" INSERT INTO VectorSearch.FHIRTextVectors (ResourceID, ResourceType, TextContent, Vector, EmbeddingModel, Provider) VALUES (?, ?, ?, TO_VECTOR(?), ?, ?) """, ( resource_id, 'DocumentReference', clinical_note, str(vector), embedder.model_name if hasattr(embedder, 'model_name') else 'unknown', embedder.provider )) conn.commit() cursor.close() conn.close() print(f"✅ Vectorized {len(documents)} documents with {embedder.provider}") if __name__ == '__main__': vectorize_all_documents() ``` --- ## Switching Between Providers ### Development Mode (OpenAI) ```bash export EMBEDDINGS_PROVIDER="openai" export OPENAI_API_KEY="sk-..." python src/setup/vectorize_documents.py ``` ### Production Demo (NIM) ```bash export EMBEDDINGS_PROVIDER="nim" export NIM_ENDPOINT="http://ec2-xx-xx-xx-xx.compute.amazonaws.com:8000/v1/embeddings" python src/setup/vectorize_documents.py ``` ### Auto-Detection If `EMBEDDINGS_PROVIDER` not set, defaults to OpenAI for development. --- ## AWS EC2 Setup for NIM ### Launch EC2 Instance ```bash # Launch g5.xlarge with GPU aws ec2 run-instances \ --image-id ami-0c55b159cbfafe1f0 \ # Deep Learning AMI --instance-type g5.xlarge \ --key-name your-key \ --security-group-ids sg-xxx \ --subnet-id subnet-xxx \ --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":100}}]' \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=nim-embeddings}]' ``` ### Install NIM Container ```bash # SSH into EC2 ssh -i your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute.amazonaws.com # Install NVIDIA Container Toolkit (if not pre-installed) sudo apt-get update sudo apt-get install -y nvidia-container-toolkit # Pull NIM container (requires NGC API key) export NGC_API_KEY="your-ngc-key" docker login nvcr.io --username '$oauthtoken' --password $NGC_API_KEY # Pull NIM text embedding model docker pull nvcr.io/nim/nvidia/nv-embedqa-e5-v5:latest # Run NIM container docker run -d \ --gpus all \ --name nim-embeddings \ -p 8000:8000 \ -e NGC_API_KEY=$NGC_API_KEY \ nvcr.io/nim/nvidia/nv-embedqa-e5-v5:latest # Test curl http://localhost:8000/v1/embeddings \ -H "Content-Type: application/json" \ -d '{ "input": "What are the symptoms of hypertension?", "model": "nvidia/nv-embedqa-e5-v5" }' ``` --- ## Cost Control: Auto Start/Stop EC2 ### Start Script ```bash #!/bin/bash # scripts/aws/start-nim-ec2.sh INSTANCE_ID="i-xxxxxxxxxxxx" echo "Starting NIM EC2 instance..." aws ec2 start-instances --instance-ids $INSTANCE_ID echo "Waiting for instance to start..." aws ec2 wait instance-running --instance-ids $INSTANCE_ID PUBLIC_IP=$(aws ec2 describe-instances \ --instance-ids $INSTANCE_ID \ --query 'Reservations[0].Instances[0].PublicIpAddress' \ --output text) echo "✅ NIM instance started: http://$PUBLIC_IP:8000" echo "Set environment variable:" echo "export NIM_ENDPOINT=\"http://$PUBLIC_IP:8000/v1/embeddings\"" ``` ### Stop Script ```bash #!/bin/bash # scripts/aws/stop-nim-ec2.sh INSTANCE_ID="i-xxxxxxxxxxxx" echo "Stopping NIM EC2 instance..." aws ec2 stop-instances --instance-ids $INSTANCE_ID echo "Waiting for instance to stop..." aws ec2 wait instance-stopped --instance-ids $INSTANCE_ID echo "✅ NIM instance stopped (saving ~$24/day)" ``` ### Usage ```bash # Morning: Start for demo/testing ./scripts/aws/start-nim-ec2.sh export NIM_ENDPOINT="http://ec2-xx-xx-xx-xx.amazonaws.com:8000/v1/embeddings" export EMBEDDINGS_PROVIDER="nim" # Test python src/query/fhir_graphrag_query.py "chest pain" --top-k 5 # Evening: Stop to save money ./scripts/aws/stop-nim-ec2.sh ``` --- ## Cost Comparison ### Development (OpenAI) - 51 documents vectorized: $0.001 - 1K queries/month: $0.13 - **Total: ~$1-5/month** ### Production Demo (NIM on EC2) - g5.xlarge: $1.006/hour - 8 hours/day × 20 days = 160 hours/month - **Total: ~$160/month** (vs $720 if running 24/7) ### Cost Savings - Auto start/stop: **Save $560/month** (78% reduction) - Use OpenAI for dev: **Save another $155/month** - **Combined savings: $715/month** --- ## Migration Checklist ### Phase 1: OpenAI Development (Week 1) - [ ] Install OpenAI SDK: `pip install openai` - [ ] Set OPENAI_API_KEY - [ ] Implement OpenAIEmbeddings class - [ ] Implement EmbeddingsFactory - [ ] Create VectorSearch.FHIRTextVectors table (3072-dim for OpenAI) - [ ] Vectorize 51 DocumentReferences - [ ] Test query functionality - [ ] Develop all features with OpenAI ### Phase 2: NIM Production Setup (Week 2) - [ ] Launch AWS EC2 g5.xlarge - [ ] Install NIM container - [ ] Test NIM endpoint - [ ] Implement NIMEmbeddings class - [ ] Test provider switching (OpenAI → NIM) - [ ] Re-vectorize with NIM (1024-dim) - [ ] Performance benchmark - [ ] Create start/stop scripts ### Phase 3: Demo Preparation - [ ] Script for starting EC2 before demo - [ ] Verify NIM endpoint accessible - [ ] Switch EMBEDDINGS_PROVIDER to "nim" - [ ] Test full demo flow - [ ] Stop EC2 after demo --- ## Summary **Development**: OpenAI ($1-5/month, fast iteration) **Production Demo**: Self-hosted NIM on EC2 ($160/month with auto-stop) **Code**: Same interface, swap with environment variable **Result**: Best of both worlds - cheap development + production-ready demo! 🚀

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server