DevOps AI Toolkit

294-gemini-3-model-variants.md•5.26 KiB

# PRD #294: Add Gemini 3 Model Variants Support ## Overview **Status**: Complete **Priority**: Medium **Created**: 2025-12-21 **GitHub Issue**: [#294](https://github.com/vfarcic/dot-ai/issues/294) ## Problem Statement Users currently only have access to Gemini 3 Pro (`google` provider) but lack access to other Gemini 3 variants that offer different trade-offs: - **Gemini 3 Flash**: Faster response times, lower cost, same 1M context - **Thinking configurations**: Control reasoning depth vs speed/cost ## Solution Add Gemini 3 variants iteratively: implement one variant, run all tests, document based on results, then proceed to the next. ## Gemini 3 Model Specifications | Variant | Model ID | Input Context | Output Limit | Thinking Levels | |---------|----------|---------------|--------------|-----------------| | Pro (existing) | `gemini-3-pro-preview` | 1,048,576 | 65,536 | low, high | | Flash | `gemini-3-flash-preview` | 1,048,576 | 65,536 | minimal, low, medium, high | **Pricing** (Gemini 3 Flash): $0.50/1M input, $3/1M output ## Proposed Provider Names | Provider Name | Model | Thinking Level | Use Case | |---------------|-------|----------------|----------| | `google` | gemini-3-pro-preview | high (default) | Existing - complex reasoning | | `google_flash` | gemini-3-flash-preview | high (default) | Fast + smart - balanced | | `google_flash_fast` | gemini-3-flash-preview | minimal | Maximum speed, simple tasks | ## Implementation Approach **Iterative process for each variant:** 1. Add to `model-config.ts` 2. Add case to `vercel-provider.ts` 3. Run full integration test suite 4. Document results (pass/fail, any issues) 5. Update `docs/setup/mcp-setup.md` if tests pass 6. Proceed to next variant ## Success Criteria - [x] Each new provider works with existing functionality - [x] Integration tests pass for each variant (98.7% pass rate for `google_flash`) - [x] Documentation accurately describes each option - [~] Thinking level configuration works correctly (deferred - not needed for current variants) ## Milestones ### Milestone 1: Gemini 3 Flash (google_flash) ✅ - [x] Add `google_flash: 'gemini-3-flash-preview'` to `CURRENT_MODELS` - [x] Add `google_flash` case to `vercel-provider.ts` - [x] Run integration tests (74/75 passed - 98.7%) - [x] Document results and update docs if successful ### Milestone 2: Gemini 3 Flash Fast (google_flash_fast) - DEFERRED - [x] Research Vercel AI SDK support for `thinking_level` parameter - [~] Add `google_flash_fast` with minimal thinking configuration - [~] Run integration tests - [~] Document results and update docs if successful **Deferral Reason**: Gemini 3 uses dynamic thinking by default (`high` level). The model automatically adjusts reasoning depth based on prompt complexity - simple tasks already use minimal thinking. The `google_flash_fast` variant (explicit `minimal` level) provides marginal benefit over this dynamic behavior. Will implement if user demand emerges. ### Milestone 3: Documentation & Cleanup ✅ - [x] Final documentation review - [x] Update CLAUDE.md if needed - [x] Verify all variants work end-to-end ## Technical Notes ### Thinking Level Configuration Gemini 3 uses `thinking_config` in generation config: ```python thinking_config=ThinkingConfig(thinking_level="minimal") ``` Need to verify Vercel AI SDK (`@ai-sdk/google`) support for this parameter. ### API Key All Gemini 3 variants use the same `GOOGLE_GENERATIVE_AI_API_KEY`. ## Out of Scope - Gemini 2.5 models - Gemini 3 Pro Image (`gemini-3-pro-image-preview`) - Custom Gemini endpoint support - Per-request thinking level configuration ## Dependencies - Existing `@ai-sdk/google` package - `GOOGLE_GENERATIVE_AI_API_KEY` environment variable ## Risks & Mitigations | Risk | Likelihood | Impact | Mitigation | |------|------------|--------|------------| | Models still in preview | Medium | Low | Document preview status; monitor for GA | | Vercel SDK lacks thinking_level support | Medium | Medium | Skip fast variant or implement workaround | | API behavior differences between variants | Low | Medium | Test each variant thoroughly | ## Progress Log | Date | Update | |------|--------| | 2025-12-21 | PRD created with iterative implementation approach | | 2025-12-23 | **Milestone 1 complete**: Added `google_flash` provider. 98.7% test pass rate (74/75). Fixed JSON parsing for multi-model support, standardized `GOOGLE_GENERATIVE_AI_API_KEY` env var, updated docs. | | 2025-12-23 | **Milestone 2 deferred**: Research confirmed Vercel AI SDK supports `thinkingLevel` parameter. However, Gemini 3's dynamic thinking (default `high`) already minimizes reasoning for simple tasks automatically. `google_flash_fast` deferred until user demand emerges. | | 2025-12-23 | **PRD complete**: `google_flash` provider shipped. Milestone 2 deferred. | ## Decisions | Decision | Date | Rationale | |----------|------|-----------| | Iterative implementation | 2025-12-21 | Test each variant before proceeding; reduces risk | | Start with `google_flash` | 2025-12-21 | Most straightforward - same config as existing `google` | | Share API key across variants | 2025-12-21 | Same Google AI API; reduces configuration burden | | Defer `google_flash_fast` | 2025-12-23 | Dynamic thinking at `high` level already optimizes for simple tasks; explicit `minimal` level provides marginal benefit |

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vfarcic/dot-ai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

294-gemini-3-model-variants.md•5.26 KiB