glm-4.7-flash vs gemini-3-flash

Pricing, Performance & Features Comparison

glm-4.7-flash

Authorzai

Context Length200K

Reasoning

Providers1

ReleasedJan 2026

Knowledge Cutoff-

License-

GLM-4.7-Flash is a 30B Mixture-of-Experts (MoE) reasoning model with approximately 3.6B active parameters, designed for local deployment with best-in-class performance for coding, agentic workflows, and chat. It supports a 200K context window and achieves open-source state-of-the-art scores on benchmarks like SWE-bench Verified and τ²-Bench, excelling particularly in frontend and backend development capabilities.

Input$0.07

Output$0.4

Latency (p50)7.9s

Output Limit131K

Function Calling

JSON Mode

InputText

OutputText

zai

in$0.07out$0.4-write$0.01

Latency (24h)

Success Rate (24h)

gemini-3-flash

Authorgoogle

Context Length1M

Reasoning

Providers1

ReleasedDec 2025

Knowledge CutoffJan 2025

License-

Gemini 3 Flash combines Gemini 3 Pro's reasoning capabilities with the Flash line's levels on latency, efficiency, and cost. It not only enables everyday tasks with improved reasoning, but is designed to tackle the most complex agentic workflows.

Input$0.5

Output$3

Latency (p50)3.3s

Output Limit66K

Function Calling

JSON Mode

Input-

Output-

google-vertex

in$0.5out$3--