Skip to main content
Glama

gemini-3.1-flash-lite-preview vs glm-5.1

Pricing, Performance & Features Comparison

Authorgoogle
Context Length1M
Reasoning
Providers1
ReleasedFeb 2025
Knowledge CutoffJun 2024
License

Gemini 2.0 Flash-Lite is Google's most cost-efficient Gemini 2.0 model, optimized for cost efficiency and low latency. It is a multimodal model supporting inputs including text, code, images, audio, and video, with text-only output. It features a maximum input capacity of 1,048,576 tokens and is cost-optimized for large-scale text output use cases.

Input$0.25
Output$1.5
Latency (p50)2s
Output Limit8K
Function Calling
JSON Mode
InputText, Image, Audio, Video
OutputText
in$0.25out$1.5cache$0.025
Latency (24h)
Success Rate (24h)
Authorzai
Context Length200K
Reasoning
Providers1
ReleasedApr 2026
Knowledge Cutoff
LicenseMIT License

Post-training upgrade to GLM-5. Mixture-of-Experts model with 744B total parameters and 40B activated per token. Trained on Huawei Ascend 910B chips with enhanced RL for agentic capabilities.

Input$1.4
Output$4.4
Latency (p50)4.9s
Output Limit131K
Function Calling
JSON Mode
InputText
OutputText
in$1.4out$4.4cache$0.26write$1.4
Latency (24h)
Success Rate (24h)