Skip to main content
Glama

pixtral-12b vs qwen-2.5-7b-instruct

Pricing, Performance & Features Comparison

Price unit:
Authormistral
Context Length128K
Reasoning
-
Providers1
ReleasedSep 2024
Knowledge Cutoff-
License-

Pixtral-12B is a natively multimodal large language model with 12 billion parameters plus a 400 million parameter vision encoder, trained with interleaved image and text data. It achieves strong performance on multimodal tasks including instruction following, while maintaining state-of-the-art performance on text-only benchmarks without compromising key text capabilities. The model supports variable image sizes and can process multiple images within its 128K token context window.

Input$0.15
Output$0.15
Latency (p50)710ms
Output Limit128K
Function Calling
JSON Mode
InputText, Image
OutputText
in$0.15out$0.15--
Latency (24h)
Success Rate (24h)
Authoralibaba
Context Length131K
Reasoning
-
Providers1
ReleasedSep 2024
Knowledge Cutoff-
LicenseApache License 2.0

Qwen/Qwen2.5-7B-Instruct is an instruction-tuned, decoder-only language model offering enhanced coding, math capabilities, and multilingual support for over 29 languages. It can handle up to 128K tokens of context and generate up to 8K tokens, making it ideal for tasks requiring extended text generation or JSON outputs. Its resilient instruction-following features make it well-suited for chatbot role-play and structured output scenarios.

Input$0.27
Output$0.27
Latency (p50)-
Output Limit8K
Function Calling
-
JSON Mode
InputText
OutputText
in$0.27out$0.27--