grok-vision-beta vs llama-3.3-70b-instruct
Pricing, Performance & Features Comparison
xAI’s Grok Vision Beta is an experimental large language model with integrated vision capabilities, supporting both text and image inputs. It can generate text-based responses in multiple languages and offers a context window of up to 8,192 tokens. The model is currently offered as a beta release and does not support fine-tuning on custom datasets.
Input$5
Output$15
Latency (p50)-
Output Limit8K
Function Calling
JSON Mode
-
InputText, Image
OutputText
in$5out$15--
Llama 3.3 is a text-only 70B instruction-tuned model that provides enhanced performance relative to Llama 3.1 70B–and to Llama 3.2 90B when used for text-only applications. Moreover, for some applications, Llama 3.3 70B approaches the performance of Llama 3.1 405B.
Input$0.45
Output$0.45
Latency (p50)-
Output Limit4K
Function Calling
JSON Mode
-
InputText
OutputText
in$0.45out$0.45--