Provides image generation capabilities using Google's Gemini AI models with customizable parameters like style and temperature
Gemini MCP Server with Smart Tool Intelligence
Welcome to the Gemini MCP Server, the first MCP server with Smart Tool Intelligence - a revolutionary self-learning system that adapts to your preferences and improves over time. This comprehensive platform provides 7 AI-powered tools with automatic prompt enhancement and context awareness.
🚀 Features Overview
🤖 7 AI-Powered Tools
- Image Generation - Create images from text prompts using Gemini 2.0 Flash
- Image Editing - Edit existing images with natural language instructions
- Chat - Interactive conversations with context-aware responses
- Audio Transcription - Convert audio to text with optional verbatim mode
- Code Execution - Run Python code in a secure sandbox environment
- Video Analysis - Analyze video content for summaries, transcripts, and insights
- Image Analysis - Extract objects, text, and detailed descriptions from images
🧠 Smart Tool Intelligence System (First in MCP Ecosystem)
- Self-Learning - Automatically learns from successful interactions
- Context Detection - Recognizes consciousness research, coding, debugging contexts
- Pattern Recognition - Identifies usage patterns and user preferences
- Prompt Enhancement - Refines prompts for better AI model performance
- Persistent Memory - Stores learned preferences across sessions
- Automatic Migration - Seamlessly upgrades preference storage
📦 Quick Start
Installation
Configuration
- Get your Gemini API key from Google AI Studio
- Copy the environment template:
- Edit
.env
and add your API key:
Running the Server
Integration with Claude Desktop
Add to your Claude Desktop config (claude_desktop_config.json
):
🛠️ Tool Reference
1. Image Generation (generate_image
)
Generate images from text descriptions using Gemini 2.0 Flash.
Parameters:
prompt
(string, required) - Description of the image to generatecontext
(string, optional) - Context for Smart Tool Intelligence enhancement
Example:
Returns:
2. Image Editing (gemini-edit-image
)
Edit existing images using natural language instructions.
Parameters:
image_path
(string, required) - Path to the image file to editedit_instruction
(string, required) - Description of desired changescontext
(string, optional) - Context for enhancement
Example:
3. Chat (gemini-chat
)
Interactive conversations with Gemini AI that learns your preferences.
Parameters:
message
(string, required) - Your message or questioncontext
(string, optional) - Context for Smart Tool Intelligence
Example:
4. Audio Transcription (gemini-transcribe-audio
)
Convert audio files to text with Smart Tool Intelligence enhancement.
Parameters:
file_path
(string, required) - Path to audio file (MP3, WAV, FLAC, AAC, OGG, WEBM, M4A)language
(string, optional) - Language hint for better accuracycontext
(string, optional) - Use "verbatim" for exact word-for-word transcriptionpreserve_spelled_acronyms
(boolean, optional) - Keep U-R-L instead of URL
Example (Standard):
Example (Verbatim Mode):
Verbatim Mode Features:
- Captures all "um", "uh", "like", repeated words
- Preserves emotional expressions: [laughs], [sighs], [clears throat]
- Maintains original punctuation and sentence structure
- No summarization or cleanup
5. Code Execution (gemini-code-execute
)
Execute Python code in a secure sandbox environment.
Parameters:
code
(string, required) - Python code to executecontext
(string, optional) - Context for enhancement
Example:
6. Video Analysis (gemini-analyze-video
)
Analyze video content for summaries, transcripts, and detailed insights.
Parameters:
file_path
(string, required) - Path to video file (MP4, MOV, AVI, WEBM, MKV, FLV)analysis_type
(string, optional) - "summary", "transcript", "objects", "detailed", "custom"context
(string, optional) - Context for enhancement
Example:
7. Image Analysis (gemini-analyze-image
)
Extract detailed information from images including objects, text, and descriptions.
Parameters:
file_path
(string, required) - Path to image file (JPEG, PNG, WebP, HEIC, HEIF, BMP, GIF)analysis_type
(string, optional) - "summary", "objects", "text", "detailed", "custom"context
(string, optional) - Context for enhancement
Example:
🧠 Smart Tool Intelligence System
How It Works
The Smart Tool Intelligence system is the first of its kind in the MCP ecosystem. It automatically:
- Detects Context - Recognizes if you're doing consciousness research, coding, debugging, etc.
- Enhances Prompts - Adds relevant instructions based on learned patterns
- Learns Patterns - Stores successful interaction patterns for future use
- Adapts Over Time - Gets better at helping you with each interaction
Context Types
The system recognizes these contexts and applies appropriate enhancements:
consciousness
- Adds academic rigor, citations, detailed explanationscode
- Includes practical examples, working code, best practicesdebugging
- Focuses on root cause analysis and specific fixesgeneral
- Applies comprehensive, structured responsesverbatim
- For audio transcription, provides exact word-for-word output
Storage Location
Preferences are stored internally at ./data/tool-preferences.json
with automatic migration from external storage.
Implementing Smart Tool Intelligence in Your MCP Server
Want to add this revolutionary capability to your own MCP server? Here's how:
1. Core Architecture
2. Integration Pattern
3. Key Implementation Files
Study these files from this repository:
src/intelligence/index.js
- Main intelligence coordinatorsrc/intelligence/context-detector.js
- Context recognition logicsrc/intelligence/prompt-enhancer.js
- Enhancement applicationsrc/intelligence/preference-store.js
- Pattern storage and retrievalsrc/tools/base-tool.js
- Integration with tool execution
🧪 Testing
Run Test Suite
Manual Testing Examples
📊 Performance & Limits
File Size Limits
- Images: 20MB (JPEG, PNG, WebP, HEIC, HEIF, BMP, GIF)
- Audio: 20MB (MP3, WAV, FLAC, AAC, OGG, WEBM, M4A)
- Video: 100MB (MP4, MOV, AVI, WEBM, MKV, FLV)
API Rate Limits
- Follows Google Gemini API rate limits
- Built-in error handling and retry logic
- Graceful degradation on quota exceeded
🏗️ Architecture Deep Dive
Modular Design
Intelligence System Flow
- Request Received → Tool's execute method called
- Context Detection → Analyze prompt for context clues
- Pattern Retrieval → Get relevant learned patterns
- Prompt Enhancement → Apply context-specific improvements
- API Execution → Send enhanced prompt to Gemini
- Pattern Storage → Store successful interaction pattern
- Response Return → Return enhanced result to user
🔧 Customization
Adding New Contexts
Adding New Tools
- Create tool file in
src/tools/my-new-tool.js
- Extend
BaseTool
class - Implement
execute
method with intelligence integration - Register in
src/tools/index.js
🐛 Troubleshooting
Common Issues
"Missing GEMINI_API_KEY" Error
"File not found" Errors
Intelligence System Not Learning
Debug Mode
Logs Location
- Application logs: Console output
- Intelligence patterns:
./data/tool-preferences.json
- Generated images:
$OUTPUT_DIR
(default:~/Claude/gemini-images
)
🤝 Contributing
We welcome contributions! This project represents a new paradigm in MCP server development.
Development Setup
Areas for Contribution
- New Contexts - Add support for specialized domains
- Enhanced Patterns - Improve learning algorithms
- New Tools - Expand Gemini AI capabilities
- Performance - Optimize intelligence system performance
- Documentation - Improve guides and examples
📈 Roadmap
- Multi-language Support - Context detection in multiple languages
- Advanced Analytics - Usage patterns and performance metrics
- Tool Chaining - Intelligent coordination between multiple tools
- Custom Models - Support for fine-tuned Gemini models
- Collaborative Learning - Share anonymized patterns across instances
- Visual Interface - Web-based configuration and monitoring
🌟 Why This Matters
This is the first MCP server that truly learns and adapts. Traditional MCP servers are static - they do the same thing every time. Our Smart Tool Intelligence system represents a paradigm shift toward AI tools that become more helpful over time.
For Users: Better results with less effort as the system learns your preferences.
For Developers: A blueprint for building truly intelligent, adaptive AI tools.
For the MCP Ecosystem: A new standard for what MCP servers can become.
📄 License
This project is licensed under the MIT License - feel free to use, modify, and distribute.
🙏 Acknowledgments
Built with:
- Google Gemini AI - Powering the core AI capabilities
- Model Context Protocol - Enabling seamless integration
- Node.js & NPM - Runtime and package management
- Claude & Rob - Human-AI collaboration at its finest
Ready to experience the future of MCP servers? Get started now and watch your AI tools become smarter with every interaction! 🚀"
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Tools
A server that enables Claude Desktop to generate images using Google's Gemini AI models through the Model Context Protocol (MCP).
Related MCP Servers
- -securityAlicense-qualityModel Context Protocol (MCP) server implementation that enables Claude Desktop to interact with Google's Gemini AI models.Last updated -216MIT License
- -security-license-qualityAn MCP server implementation that allows using Google's Gemini AI models (specifically Gemini 1.5 Pro) through Claude or other MCP clients via the Model Context Protocol.Last updated -1
- -securityFlicense-qualityA Model Context Protocol server that enables Claude Desktop to interact with Google's Gemini 2.5 Pro Experimental AI model, with features like Google Search integration and token usage reporting.Last updated -3
- -securityAlicense-qualityAn MCP server that enables other AI models (like Claude) to use Google's Gemini models as tools for specific tasks through a standardized interface.Last updated -1MIT License