gemini-1.5-flash-002 vs llama-3.2-11b-vision-instruct

Pricing, Performance & Features Comparison

gemini-1.5-flash-002

Authorgoogle

Context Length1M

Reasoning

Providers1

ReleasedSep 2024

Knowledge CutoffMay 2024

License-

Superseded Bygemini-2.0-flash-exp

Gemini 1.5 Flash-002 is a high-performance multimodal LLM optimized for speed and efficiency, capable of handling text, images, audio, and video. It supports large context windows and delivers strong capabilities for summarization, data extraction, and chat applications. This model is designed to operate at scale with low latency and high throughput.

Input$0.075

Output$0.3

Latency (p50)-

Output Limit8K

Function Calling

JSON Mode

InputText, Image, Audio, Video

OutputText

google-vertex

in$0.075out$0.3--

llama-3.2-11b-vision-instruct

Authormeta

Context Length128K

Reasoning

Providers1

ReleasedSep 2024

Knowledge CutoffDec 2023

License-

The 'meta-llama/llama-3.2-11b-vision-instruct' model is optimized for visual recognition, image reasoning, captioning, and question answering about images. It extends the Llama 3.1 base with a vision adapter and cross-attention layers, and uses fine-tuning for alignment with human preferences. The model supports multiple languages for text-only tasks and English for image-text applications.

Input$0.00

Output$0.00

Latency (p50)-

Output Limit4K

Function Calling

JSON Mode

InputText, Image

OutputText

together

in$0.00out$0.00--