glm-5 vs gemini-3-flash

Pricing, Performance & Features Comparison

glm-5

Authorzai

Context Length200K

Reasoning

Providers1

ReleasedFeb 2026

Knowledge Cutoff-

License-

GLM-5 is a mixture-of-experts language model from Z.ai with 744 billion total parameters and 40 billion active parameters, designed for complex systems engineering and long-horizon agentic tasks. It utilizes DeepSeek Sparse Attention (DSA) to reduce deployment costs while maintaining long-context capacity, and achieves best-in-class performance among open-source models in reasoning, coding, and agentic tasks.

Input$1

Output$3.2

Latency (p50)5.1s

Output Limit131K

Function Calling

JSON Mode

InputText

OutputText

zai

in$1out$3.2cache$0.2-

Latency (24h)

Success Rate (24h)

gemini-3-flash

Authorgoogle

Context Length1M

Reasoning

Providers1

ReleasedDec 2025

Knowledge CutoffJan 2025

License-

Gemini 3 Flash combines Gemini 3 Pro's reasoning capabilities with the Flash line's levels on latency, efficiency, and cost. It not only enables everyday tasks with improved reasoning, but is designed to tackle the most complex agentic workflows.

Input$0.5

Output$3

Latency (p50)4.3s

Output Limit66K

Function Calling

JSON Mode

Input-

Output-

google-vertex

in$0.5out$3--