pixtral-12b vs qwen-2.5-72b-instruct

Pricing, Performance & Features Comparison

pixtral-12b

Authormistral

Context Length128K

Reasoning

Providers1

ReleasedSep 2024

Knowledge Cutoff-

License-

Pixtral-12B is a natively multimodal large language model with 12 billion parameters plus a 400 million parameter vision encoder, trained with interleaved image and text data. It achieves strong performance on multimodal tasks including instruction following, while maintaining state-of-the-art performance on text-only benchmarks without compromising key text capabilities. The model supports variable image sizes and can process multiple images within its 128K token context window.

Input$0.15

Output$0.15

Latency (p50)834ms

Output Limit128K

Function Calling

JSON Mode

InputText, Image

OutputText

mistral

in$0.15out$0.15--

Latency (24h)

Success Rate (24h)

qwen-2.5-72b-instruct

Authoralibaba

Context Length131K

Reasoning

Providers1

ReleasedSep 2024

Knowledge Cutoff-

License-

Qwen2.5-72B-Instruct is a 72-billion-parameter, decoder-only language model designed for advanced instruction following and long-text generation. It excels at structured data understanding and output, especially JSON, and offers improved coding and mathematical reasoning. The model also supports over 29 languages and can handle extended contexts of up to 128K tokens.

Input$0.23

Output$0.4

Latency (p50)-

Output Limit8K

Function Calling

JSON Mode

InputText

OutputText

deepinfra

in$0.23out$0.4--