llama-3.2-11b-vision-instruct vs gemini-1.5-pro-002

Pricing, Performance & Features Comparison

llama-3.2-11b-vision-instruct

Authormeta

Context Length128K

Reasoning

Providers1

ReleasedSep 2024

Knowledge CutoffDec 2023

License–

The 'meta-llama/llama-3.2-11b-vision-instruct' model is optimized for visual recognition, image reasoning, captioning, and question answering about images. It extends the Llama 3.1 base with a vision adapter and cross-attention layers, and uses fine-tuning for alignment with human preferences. The model supports multiple languages for text-only tasks and English for image-text applications.

Input$0.00

Output$0.00

Latency (p50)–

Output Limit4K

Function Calling

JSON Mode

InputText, Image

OutputText

together

in$0.00out$0.00––

gemini-1.5-pro-002

Authorgoogle

Context Length2M

Reasoning

Providers1

ReleasedSep 2024

Knowledge Cutoff–

License–

Google's advanced multimodal AI model featuring improved reasoning, enhanced coding capabilities, and extended context window. Excels at complex analytical tasks and creative content.

Input$1.3

Output$5

Latency (p50)–

Output Limit8K

Function Calling

JSON Mode

Input–

Output–

google-vertex

in$1.3out$5––