Integrations
Enables sending prompts and files to Gemini 2.5 Pro with support for large context (up to 1M tokens). Offers two main tools: 'second-opinion' for getting model responses on file content, and 'expert-review' for receiving code change suggestions formatted as SEARCH/REPLACE blocks.
mcp-sage
An MCP (Model Context Protocol) server that provides tools for sending prompts to either OpenAI's O3 model or Google's Gemini 2.5 Pro based on token count. The tools embed all referenced filepaths (recursively for folders) in the prompt. This is useful for getting second opinions or detailed code reviews from a model that can handle tons of context accurately.
Rationale
I make heavy use of Claude Code. It's a great product that works well for my workflow. Newer models with large amounts of context seem really useful though for dealing with more complex codebases where more context is needed. This lets me continue to use Claude Code as a development tool while leveraging the large context capabilities of O3 and Gemini 2.5 Pro to augment Claude Code's limited context.
Model Selection
The server automatically selects the appropriate model based on token count and available API keys:
- For smaller contexts (≤ 200K tokens): Uses OpenAI's O3 model (if OPENAI_API_KEY is set)
- For larger contexts (> 200K and ≤ 1M tokens): Uses Google's Gemini 2.5 Pro (if GEMINI_API_KEY is set)
- If the content exceeds 1M tokens: Returns an informative error
Fallback behavior:
- API Key Fallback:
- If OPENAI_API_KEY is missing, Gemini will be used for all contexts within its 1M token limit
- If GEMINI_API_KEY is missing, only smaller contexts (≤ 200K tokens) can be processed with O3
- If both API keys are missing, an informative error is returned
- Network Connectivity Fallback:
- If OpenAI API is unreachable (network error), the system automatically falls back to Gemini
- This provides resilience against temporary network issues with one provider
- Requires GEMINI_API_KEY to be set for fallback to work
Inspiration
This project draws inspiration from two other open source projects:
- simonw/files-to-prompt for the file compression
- asadm/vibemode for the idea and prompt to send the entire repo to Gemini for wholesale edit suggestions
- PhialsBasement/Chain-of-Recursive-Thoughts inspiration for the sage-plan tool
Overview
This project implements an MCP server that exposes three tools:
sage-opinion
- Takes a prompt and a list of file/dir paths as input
- Packs the files into a structured XML format
- Measures the token count and selects the appropriate model:
- O3 for ≤ 200K tokens
- Gemini 2.5 Pro for > 200K and ≤ 1M tokens
- Sends the combined prompt + context to the selected model
- Returns the model's response
sage-review
- Takes an instruction for code changes and a list of file/dir paths as input
- Packs the files into a structured XML format
- Measures the token count and selects the appropriate model:
- O3 for ≤ 200K tokens
- Gemini 2.5 Pro for > 200K and ≤ 1M tokens
- Creates a specialized prompt instructing the model to format responses using SEARCH/REPLACE blocks
- Sends the combined context + instruction to the selected model
- Returns edit suggestions formatted as SEARCH/REPLACE blocks for easy implementation
sage-plan
- Takes a prompt requesting an implementation plan and a list of file/dir paths as input
- Packs the files into a structured XML format
- Orchestrates a multi-model debate to generate a high-quality implementation plan
- Models critique and refine each other's plans through multiple rounds
- Returns the winning implementation plan with detailed steps
sage-plan - Multi-Model & Self-Debate Workflows
The sage-plan
tool doesn't ask a single model for a plan.
Instead it orchestrates a structured debate that runs for one or more rounds and then asks a separate judge model (or the same model in CoRT mode) to pick the winner.
1. Multi-Model Debate Flow
Key phases in the multi-model debate:
Setup Phase
- The system determines available models, selects a judge, and allocates token budgets
Round 1
- Generation Phase - Every available model (A, B, C, etc.) writes its own implementation plan in parallel
- Critique Phase - Each model reviews all other plans (never its own) and produces structured critiques in parallel
Rounds 2 to N (N defaults to 3)
- Synthesis Phase - Each model improves its previous plan using critiques it received (models work in parallel)
- Consensus Check - The judge model scores similarity between all current plans
- If score ≥ 0.9, the debate stops early and jumps to Judgment
- Critique Phase - If consensus is not reached AND we're not in the final round, each model critiques all other plans again (in parallel)
Judgment Phase
- After completing all rounds (or reaching early consensus), the judge model (O3 by default):
- Selects the single best plan OR merges multiple plans into a superior one
- Provides a confidence score for its selection/synthesis
2. Self-Debate Flow - Single Model Available
When only one model is available, a Chain of Recursive Thoughts (CoRT) approach is used:
- Initial Burst - The model generates three distinct plans, each taking a different approach
- Refinement Rounds - For each subsequent round (2 to N, default N=3):
- The model reviews all previous plans
- It critiques them internally, identifying strengths and weaknesses
- It produces one new improved plan that addresses limitations in earlier plans
- Final Selection - The last plan generated becomes the final implementation plan
What Actually Happens in Code (quick reference)
Phase / Functionality | Code Location | Notes |
---|---|---|
Generation Prompts | prompts/debatePrompts.generatePrompt | Adds heading "# Implementation Plan (Model X)" |
Critique Prompts | prompts/debatePrompts.critiquePrompt | Uses "## Critique of Plan {ID}" sections |
Synthesis Prompts | prompts/debatePrompts.synthesizePrompt | Model revises its own plan |
Consensus Check | debateOrchestrator.checkConsensus | Judge model returns JSON with consensusScore |
Judgment | prompts/debatePrompts.judgePrompt | Judge returns "# Final Implementation Plan" + confidence |
Self-Debate Prompt | prompts/debatePrompts.selfDebatePrompt | Chain-of-Recursive-Thoughts loop |
Performance and Cost Considerations
⚠️ Important: The sage-plan tool can:
- Take a significant amount of time to complete (5-10 minutes with multiple models)
- Consume substantial API tokens due to multiple rounds of debate
- Incur higher costs than single-model approaches
Typical resource usage:
- Multi-model debate: 2-4x more tokens than a single model approach
- Processing time: 5-10 minutes depending on complexity and model availability
- API costs: $0.30-$1.50 per plan generation (varies by models used and plan complexity)
Prerequisites
- Node.js (v18 or later)
- A Google Gemini API key (for larger contexts)
- An OpenAI API key (for smaller contexts)
Installation
Environment Variables
Set the following environment variables:
OPENAI_API_KEY
: Your OpenAI API key (for O3 model)GEMINI_API_KEY
: Your Google Gemini API key (for Gemini 2.5 Pro)
Usage
After building with npm run build
, add the following to your MCP configuration:
You can also use environment variables set elsewhere, like in your shell profile.
Prompting
To get a second opinion on something just ask for a second opinion.
To get a code review, ask for a code review or expert review.
Both of these benefit from providing paths of files that you wnat to be included in context, but if omitted the host LLM will probably infer what to include.
Debugging and Monitoring
The server provides detailed monitoring information via the MCP logging capability. These logs include:
- Token usage statistics and model selection
- Number of files and documents included in the request
- Request processing time metrics
- Error information when token limits are exceeded
Logs are sent via the MCP protocol's notifications/message
method, ensuring they don't interfere with the JSON-RPC communication. MCP clients with logging support will display these logs appropriately.
Example log entries:
Using the Tools
sage-opinion Tool
The sage-opinion
tool accepts the following parameters:
prompt
(string, required): The prompt to send to the selected modelpaths
(array of strings, required): List of file paths to include as context
Example MCP tool call (using JSON-RPC 2.0):
sage-review Tool
The sage-review
tool accepts the following parameters:
instruction
(string, required): The specific changes or improvements neededpaths
(array of strings, required): List of file paths to include as context
Example MCP tool call (using JSON-RPC 2.0):
The response will contain SEARCH/REPLACE blocks that you can use to implement the suggested changes:
sage-plan Tool
The sage-plan
tool accepts the following parameters:
prompt
(string, required): Description of what you need an implementation plan forpaths
(array of strings, required): List of file paths to include as context
Example MCP tool call (using JSON-RPC 2.0):
The response contains a detailed implementation plan with:
- High-level architecture overview
- Specific implementation steps
- File changes needed
- Testing strategy
- Potential challenges and mitigations
This plan benefits from the collective intelligence of multiple AI models (or thorough self-review by a single model) and typically contains more robust, thoughtful, and detailed recommendations than a single-pass approach.
Running the Tests
To test the tools:
Note: The sage-plan test may take 5-15 minutes to run as it orchestrates a multi-model debate.
Project Structure
src/index.ts
: The main MCP server implementation with tool definitionssrc/pack.ts
: Tool for packing files into a structured XML formatsrc/tokenCounter.ts
: Utilities for counting tokens in a promptsrc/gemini.ts
: Gemini API client implementationsrc/openai.ts
: OpenAI API client implementation for O3 modelsrc/debateOrchestrator.ts
: Multi-model debate orchestration for sage-plansrc/prompts/debatePrompts.ts
: Templates for debate prompts and instructionstest/run-test.js
: Test for the sage-opinion tooltest/test-expert.js
: Test for the sage-review tooltest/run-sage-plan.js
: Test for the sage-plan tooltest/test-o3.js
: Test for the model selection logic
License
ISC
You must be authenticated.
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
An MCP server that provides tools for getting second opinions or detailed code reviews from Gemini 2.5 Pro by embedding file contents in prompts, allowing it to handle large codebases with extensive context.
Related MCP Servers
- -securityAlicense-qualityThe ultimate Gemini API interface for MCP hosts, intelligently selecting models for the task at hand—delivering optimal performance, minimal token cost, and seamless integration.Last updated -6TypeScriptMIT License
- -security-license-qualityAn MCP server that automatically generates documentation, test plans, and code reviews for code repositories by analyzing directory structures and code files using AI models via OpenRouter API.Last updated -3TypeScriptCreative Commons Zero v1.0 Universal
- -securityAlicense-qualityAn MCP server that analyzes codebases and generates contextual prompts, making it easier for AI assistants to understand and work with code repositories.Last updated -2PythonMIT License
- -securityAlicense-qualityAn MCP server that reviews code with the sarcastic and cynical tone of a grumpy senior developer, helping identify issues in PRs and providing feedback on code quality.Last updated -2210JavaScriptMIT License