Skip to main content
Glama

glm-4.7-flash vs kimi-k2.5

Pricing, Performance & Features Comparison

Authorzai
Context Length200K
Reasoning
-
Providers1
ReleasedJan 2026
Knowledge Cutoff
License

GLM-4.7-Flash is a 30B Mixture-of-Experts (MoE) reasoning model with approximately 3.6B active parameters, designed for local deployment with best-in-class performance for coding, agentic workflows, and chat. It supports a 200K context window and achieves open-source state-of-the-art scores on benchmarks like SWE-bench Verified and τ²-Bench, excelling particularly in frontend and backend development capabilities.

Input$0.07
Output$0.4
Latency (p50)14.1s
Output Limit131K
Function Calling
JSON Mode
-
InputText
OutputText
in$0.07out$0.4write$0.01
Latency (24h)
Success Rate (24h)
Authormoonshot
Context Length262K
Reasoning
-
Providers2
ReleasedJan 2026
Knowledge CutoffApr 2024
License
Superseded Bykimi-k2.6

Kimi K2.5 is Moonshot's most intelligent and versatile model to date, featuring a native multimodal architecture that supports both visual and text input alongside thinking and non-thinking modes. It achieves state-of-the-art performance in coding, reasoning, and Agent tasks, utilizing a 256K context window to solve complex logical and mathematical problems.

Input$0.45
Output$2.3
Latency (p50)5.9s
Output Limit96K
Function Calling
JSON Mode
-
InputText, Image, Video
OutputText
deepinfra
Cheapest
in$0.45out$2.3cache$0.07write$0.07
in$0.6out$3cache$0.1
Latency (24h)
Success Rate (24h)