Skip to main content
Glama

pixtral-12b vs ministral-3b-2410

Pricing, Performance & Features Comparison

Price unit:
Authormistral
Context Length128K
Reasoning
-
Providers1
ReleasedSep 2024
Knowledge Cutoff-
License-

Pixtral-12B is a natively multimodal large language model with 12 billion parameters plus a 400 million parameter vision encoder, trained with interleaved image and text data. It achieves strong performance on multimodal tasks including instruction following, while maintaining state-of-the-art performance on text-only benchmarks without compromising key text capabilities. The model supports variable image sizes and can process multiple images within its 128K token context window.

Input$0.15
Output$0.15
Latency (p50)710ms
Output Limit128K
Function Calling
JSON Mode
InputText, Image
OutputText
in$0.15out$0.15--
Latency (24h)
Success Rate (24h)
Authormistral
Context Length128K
Reasoning
-
Providers1
ReleasedSep 2024
Knowledge CutoffOct 2024
License-

Ministral/ministral-3b-2410 is described as the world’s best edge model, designed for robust performance in resource-constrained environments. It focuses on delivering high-quality outputs while keeping computational requirements low, making it ideal for edge deployments.

Input$0.04
Output$0.04
Latency (p50)676ms
Output Limit4K
Function Calling
-
JSON Mode
-
Input-
Output-
in$0.04out$0.04--
Latency (24h)
Success Rate (24h)