Skip to main content
Glama

glm-5 vs gemini-3-flash

Pricing, Performance & Features Comparison

Authorzai
Context Length200K
Reasoning
-
Providers1
ReleasedFeb 2026
Knowledge Cutoff-
License-

GLM-5 is a mixture-of-experts language model from Z.ai with 744 billion total parameters and 40 billion active parameters, designed for complex systems engineering and long-horizon agentic tasks. It utilizes DeepSeek Sparse Attention (DSA) to reduce deployment costs while maintaining long-context capacity, and achieves best-in-class performance among open-source models in reasoning, coding, and agentic tasks.

Input$1
Output$3.2
Latency (p50)5.1s
Output Limit131K
Function Calling
JSON Mode
InputText
OutputText
in$1out$3.2cache$0.2-
Latency (24h)
Success Rate (24h)
Authorgoogle
Context Length1M
Reasoning
Providers1
ReleasedDec 2025
Knowledge CutoffJan 2025
License-

Gemini 3 Flash combines Gemini 3 Pro's reasoning capabilities with the Flash line's levels on latency, efficiency, and cost. It not only enables everyday tasks with improved reasoning, but is designed to tackle the most complex agentic workflows.

Input$0.5
Output$3
Latency (p50)4.3s
Output Limit66K
Function Calling
JSON Mode
-
Input-
Output-
in$0.5out$3--
Latency (24h)
Success Rate (24h)