glm-5 vs glm-4.7-flash

Pricing, Performance & Features Comparison

glm-5

Authorzai

Context Length200K

Reasoning

Providers1

ReleasedFeb 2026

Knowledge Cutoff-

License-

GLM-5 is a mixture-of-experts language model from Z.ai with 744 billion total parameters and 40 billion active parameters, designed for complex systems engineering and long-horizon agentic tasks. It utilizes DeepSeek Sparse Attention (DSA) to reduce deployment costs while maintaining long-context capacity, and achieves best-in-class performance among open-source models in reasoning, coding, and agentic tasks.

Input$1

Output$3.2

Latency (p50)7.3s

Output Limit131K

Function Calling

JSON Mode

InputText

OutputText

zai

in$1out$3.2cache$0.2-

Latency (24h)

Success Rate (24h)

glm-4.7-flash

Authorzai

Context Length200K

Reasoning

Providers1

ReleasedJan 2026

Knowledge Cutoff-

License-

GLM-4.7-Flash is a 30B Mixture-of-Experts (MoE) reasoning model with approximately 3.6B active parameters, designed for local deployment with best-in-class performance for coding, agentic workflows, and chat. It supports a 200K context window and achieves open-source state-of-the-art scores on benchmarks like SWE-bench Verified and τ²-Bench, excelling particularly in frontend and backend development capabilities.

Input$0.07

Output$0.4

Latency (p50)7s

Output Limit131K

Function Calling

JSON Mode

InputText

OutputText

zai

in$0.07out$0.4-write$0.01