deepseek-v4-flash vs glm-5.1
Pricing, Performance & Features Comparison
Context Length1M
Reasoning
Providers1
ReleasedApr 2026
Knowledge Cutoff-
LicenseMIT License
Mixture-of-Experts model with 284B total parameters and 13B activated per token. Features hybrid attention architecture for efficient 1M context processing.
Input$0.14
Output$0.28
Latency (p50)3.3s
Output Limit384K
Function Calling
JSON Mode
-
InputText
OutputText
in$0.14out$0.28cache$0.028write$0.14
Latency (24h)
Success Rate (24h)
Post-training upgrade to GLM-5. Mixture-of-Experts model with 744B total parameters and 40B activated per token. Trained on Huawei Ascend 910B chips with enhanced RL for agentic capabilities.
Input$1.4
Output$4.4
Latency (p50)6.7s
Output Limit131K
Function Calling
JSON Mode
InputText
OutputText
in$1.4out$4.4cache$0.26write$1.4