The MCP Gemini Server acts as a bridge to Google's Gemini models, providing a standardized interface for:
Text Generation: Generate responses using
gemini_generateContent(non-streaming) orgemini_generateContentStream(streaming)Function Calling: Enable models to request execution of client-defined functions via
gemini_functionCallStateful Chat: Manage multi-turn conversations using
gemini_startChat,gemini_sendMessage, andgemini_sendFunctionResultFile Handling: Upload, list, retrieve, and delete files with
gemini_uploadFile,gemini_listFiles,gemini_getFile, andgemini_deleteFile(requires Google AI Studio key)Caching: Create, list, retrieve, update, and delete cached content to optimize frequently reused prompts (requires Google AI Studio key and compatible model)
Provides access to Google's Gemini AI models via the Google AI SDK, enabling text generation, function calling, chat sessions, file handling, and content caching capabilities.
Provides a JavaScript interface to Gemini AI capabilities through the compiled server.
Implements a server that runs on Node.js to interface with Google's Gemini models, providing a consistent tool-based interface for AI interactions.
Built with TypeScript to provide type safety and better developer experience when extending or modifying the server.
Uses Zod for parameter validation and schema definition, ensuring proper formatting of requests to the Gemini API.
MCP Gemini Server
Table of Contents
Related MCP server: MCP Gemini API Server
Overview
This project provides a dedicated MCP (Model Context Protocol) server that wraps the @google/genai SDK (v0.10.0). It exposes Google's Gemini model capabilities as standard MCP tools, allowing other LLMs (like Claude) or MCP-compatible systems to leverage Gemini's features as a backend workhorse.
This server aims to simplify integration with Gemini models by providing a consistent, tool-based interface managed via the MCP standard. It supports the latest Gemini models including gemini-1.5-pro-latest, gemini-1.5-flash, and gemini-2.5-pro models.
Important Note: This server does not support direct file uploads. Instead, it focuses on URL-based multimedia analysis for images and videos. For text-based content processing, use the standard content generation tools.
File Uploads vs URL-Based Analysis
❌ Not Supported: Direct File Uploads
This MCP Gemini Server does not support the following file upload operations:
Local file uploads: Cannot upload files from your local filesystem to Gemini
Base64 encoded files: Cannot process base64-encoded image or video data
Binary file data: Cannot handle raw file bytes or binary data
File references: Cannot process file IDs or references from uploaded content
Audio file uploads: Cannot upload and transcribe audio files directly
Why File Uploads Are Not Supported:
Simplified architecture focused on URL-based processing
Enhanced security by avoiding file handling complexities
Reduced storage and bandwidth requirements
Streamlined codebase maintenance
✅ Fully Supported: URL-Based Multimedia Analysis
This server fully supports analyzing multimedia content from publicly accessible URLs:
Image Analysis from URLs:
Public image URLs: Analyze images hosted on any publicly accessible web server
Supported formats: PNG, JPEG, WebP, HEIC, HEIF via direct URL access
Multiple images: Process multiple image URLs in a single request
Security validation: Automatic URL validation and security screening
YouTube Video Analysis:
Public YouTube videos: Full analysis of any public YouTube video content
Video understanding: Extract insights, summaries, and detailed analysis
Educational content: Perfect for analyzing tutorials, lectures, and educational videos
Multiple videos: Process multiple YouTube URLs (up to 10 per request with Gemini 2.5+)
Web Content Processing:
HTML content: Analyze and extract information from web pages
Mixed media: Combine text content with embedded images and videos
Contextual analysis: Process URLs alongside text prompts for comprehensive analysis
Alternatives for Local Content
If you have local files to analyze:
Host on a web server: Upload your files to a public web server and use the URL
Use cloud storage: Upload to services like Google Drive, Dropbox, or AWS S3 with public access
Use GitHub: Host images in a GitHub repository and use the raw file URLs
Use image hosting services: Upload to services like Imgur, ImageBB, or similar platforms
For audio content:
Use external transcription services (Whisper API, Google Speech-to-Text, etc.)
Upload audio to YouTube and analyze the resulting video URL
Use other MCP servers that specialize in audio processing
Features
Core Generation: Standard (
gemini_generateContent) and streaming (gemini_generateContentStream) text generation with support for system instructions and cached content.Function Calling: Enables Gemini models to request the execution of client-defined functions (
gemini_functionCall).Stateful Chat: Manages conversational context across multiple turns (
gemini_startChat,gemini_sendMessage,gemini_sendFunctionResult) with support for system instructions, tools, and cached content.URL-Based Multimedia Analysis: Analyze images from public URLs and YouTube videos without file uploads. Direct file uploads are not supported.
Caching: Create, list, retrieve, update, and delete cached content to optimize prompts with support for tools and tool configurations.
Image Generation: Generate images from text prompts using Gemini 2.0 Flash Experimental (
gemini_generateImage) with control over resolution, number of images, and negative prompts. Also supports the latest Imagen 3.1 model for high-quality dedicated image generation with advanced style controls. Note that Gemini 2.5 models (Flash and Pro) do not currently support image generation.URL Context Processing: Fetch and analyze web content directly from URLs with advanced security, caching, and content processing capabilities.
gemini_generateContent: Enhanced with URL context support for including web content in promptsgemini_generateContentStream: Streaming generation with URL context integrationgemini_url_analysis: Specialized tool for advanced URL content analysis with multiple analysis types
MCP Client: Connect to and interact with external MCP servers.
mcpConnectToServer: Establishes a connection to an external MCP server.mcpListServerTools: Lists available tools on a connected MCP server.mcpCallServerTool: Calls a function on a connected MCP server, with an option for file output.mcpDisconnectFromServer: Disconnects from an external MCP server.writeToFile: Writes content directly to files within allowed directories.
Prerequisites
Node.js (v18 or later)
An API Key from Google AI Studio (https://aistudio.google.com/app/apikey).
Important: The Caching API is only compatible with Google AI Studio API keys and is not supported when using Vertex AI credentials. This server does not currently support Vertex AI authentication.
Installation & Setup
Installing Manually
Clone/Place Project: Ensure the
mcp-gemini-serverproject directory is accessible on your system.Install Dependencies: Navigate to the project directory in your terminal and run:
npm installBuild Project: Compile the TypeScript source code:
npm run buildThis command uses the TypeScript compiler (
tsc) and outputs the JavaScript files to the./distdirectory (as specified byoutDirintsconfig.json). The main server entry point will bedist/server.js.Generate Connection Token: Create a strong, unique connection token for secure communication between your MCP client and the server. This is a shared secret that you generate and configure on both the server and client sides.
Generate a secure token using one of these methods:
Option A: Using Node.js crypto (Recommended)
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"Option B: Using OpenSSL
openssl rand -hex 32Option C: Using PowerShell (Windows)
[System.Convert]::ToBase64String([System.Security.Cryptography.RandomNumberGenerator]::GetBytes(32))Option D: Online Generator (Use with caution) Use a reputable password generator like 1Password or Bitwarden to generate a 64-character random string.
Important Security Notes:
The token should be at least 32 characters long and contain random characters
Never share this token or commit it to version control
Use a different token for each server instance
Store the token securely (environment variables, secrets manager, etc.)
Save this token - you'll need to use the exact same value in both server and client configurations
Configure MCP Client: Add the server configuration to your MCP client's settings file (e.g.,
cline_mcp_settings.jsonfor Cline/VSCode, orclaude_desktop_config.jsonfor Claude Desktop App). Replace/path/to/mcp-gemini-serverwith the actual absolute path on your system,YOUR_API_KEYwith your Google AI Studio key, andYOUR_GENERATED_CONNECTION_TOKENwith the token you generated in step 4.{ "mcpServers": { "gemini-server": { // Or your preferred name "command": "node", "args": ["/path/to/mcp-gemini-server/dist/server.js"], // Absolute path to the compiled server entry point "env": { "GOOGLE_GEMINI_API_KEY": "YOUR_API_KEY", "MCP_SERVER_HOST": "localhost", // Required: Server host "MCP_SERVER_PORT": "8080", // Required: Server port "MCP_CONNECTION_TOKEN": "YOUR_GENERATED_CONNECTION_TOKEN", // Required: Use the token from step 4 "GOOGLE_GEMINI_MODEL": "gemini-1.5-flash", // Optional: Set a default model // Optional security configurations removed - file operations no longer supported "ALLOWED_OUTPUT_PATHS": "/var/opt/mcp-gemini-server/outputs,/tmp/mcp-gemini-outputs" // Optional: Comma-separated list of allowed output directories for mcpCallServerTool and writeToFileTool }, "disabled": false, "autoApprove": [] } // ... other servers } }Important Notes:
The path in
argsmust be the absolute path to the compileddist/server.jsfileMCP_SERVER_HOST,MCP_SERVER_PORT, andMCP_CONNECTION_TOKENare required unlessNODE_ENVis set totestMCP_CONNECTION_TOKENmust be the exact same value you generated in step 4Ensure the path exists and the server has been built using
npm run build
Restart MCP Client: Restart your MCP client application (e.g., VS Code with Cline extension, Claude Desktop App) to load the new server configuration. The MCP client will manage starting and stopping the server process.
Configuration
The server uses environment variables for configuration, passed via the env object in the MCP settings:
GOOGLE_GEMINI_API_KEY(Required): Your API key obtained from Google AI Studio.GOOGLE_GEMINI_MODEL(Optional): Specifies a default Gemini model name (e.g.,gemini-1.5-flash,gemini-1.0-pro). If set, tools that require a model name (likegemini_generateContent,gemini_startChat, etc.) will use this default when themodelNameparameter is omitted in the tool call. This simplifies client calls when primarily using one model. If this environment variable is not set, themodelNameparameter becomes required for those tools. See the Google AI documentation for available model names.ALLOWED_OUTPUT_PATHS(Optional): A comma-separated list of absolute paths to directories where themcpCallServerTool(withoutputToFileparameter) andwriteToFileToolare allowed to write files. If not set, file output will be disabled for these tools. This is a security measure to prevent arbitrary file writes.
Available Tools
This server provides the following MCP tools. Parameter schemas are defined using Zod for validation and description.
Validation and Error Handling: All parameters are validated using Zod schemas at both the MCP tool level and service layer, providing consistent validation, detailed error messages, and type safety. The server implements comprehensive error mapping to provide clear, actionable error messages.
Retry Logic: API requests automatically use exponential backoff retry for transient errors (network issues, rate limits, timeouts), improving reliability for unstable connections. The retry mechanism includes configurable parameters for maximum attempts, delay times, and jitter to prevent thundering herd effects.
Note on Optional Parameters: Many tools accept complex optional parameters (e.g., generationConfig, safetySettings, toolConfig, history, functionDeclarations, contents). These parameters are typically objects or arrays whose structure mirrors the types defined in the underlying @google/genai SDK (v0.10.0). For the exact structure and available fields within these complex parameters, please refer to:
1. The corresponding src/tools/*Params.ts file in this project.
2. The official Google AI JS SDK Documentation.
Core Generation
gemini_generateContentDescription: Generates non-streaming text content from a prompt with optional URL context support.
Required Params:
prompt(string)Optional Params:
modelName(string) - Name of the model to usegenerationConfig(object) - Controls generation parameters like temperature, topP, etc.thinkingConfig(object) - Controls model reasoning processthinkingBudget(number) - Maximum tokens for reasoning (0-24576)reasoningEffort(string) - Simplified control: "none" (0 tokens), "low" (1K), "medium" (8K), "high" (24K)
safetySettings(array) - Controls content filtering by harm categorysystemInstruction(string or object) - System instruction to guide model behaviorcachedContentName(string) - Identifier for cached content to use with this requesturlContext(object) - Fetch and include web content from URLsurls(array) - URLs to fetch and include as context (max 20)fetchOptions(object) - Configuration for URL fetchingmaxContentKb(number) - Maximum content size per URL in KB (default: 100)timeoutMs(number) - Fetch timeout per URL in milliseconds (default: 10000)includeMetadata(boolean) - Include URL metadata in context (default: true)convertToMarkdown(boolean) - Convert HTML to markdown (default: true)allowedDomains(array) - Specific domains to allow for this requestuserAgent(string) - Custom User-Agent header for URL requests
modelPreferences(object) - Model selection preferences
Note: Can handle multimodal inputs, cached content, and URL context for comprehensive content generation
Thinking Budget: Controls the token budget for model reasoning. Lower values provide faster responses, higher values improve complex reasoning.
gemini_generateContentStreamDescription: Generates text content via streaming using Server-Sent Events (SSE) for real-time content delivery with URL context support.
Required Params:
prompt(string)Optional Params:
modelName(string) - Name of the model to usegenerationConfig(object) - Controls generation parameters like temperature, topP, etc.thinkingConfig(object) - Controls model reasoning processthinkingBudget(number) - Maximum tokens for reasoning (0-24576)reasoningEffort(string) - Simplified control: "none" (0 tokens), "low" (1K), "medium" (8K), "high" (24K)
safetySettings(array) - Controls content filtering by harm categorysystemInstruction(string or object) - System instruction to guide model behaviorcachedContentName(string) - Identifier for cached content to use with this requesturlContext(object) - Same URL context options asgemini_generateContentmodelPreferences(object) - Model selection preferences
Function Calling
gemini_functionCallDescription: Sends a prompt and function declarations to the model, returning either a text response or a requested function call object (as a JSON string).
Required Params:
prompt(string),functionDeclarations(array)Optional Params:
modelName(string) - Name of the model to usegenerationConfig(object) - Controls generation parameterssafetySettings(array) - Controls content filteringtoolConfig(object) - Configures tool behavior like temperature and confidence thresholds
Stateful Chat
gemini_startChatDescription: Initiates a new stateful chat session and returns a unique
sessionId.Optional Params:
modelName(string) - Name of the model to usehistory(array) - Initial conversation historytools(array) - Tool definitions including function declarationsgenerationConfig(object) - Controls generation parametersthinkingConfig(object) - Controls model reasoning processthinkingBudget(number) - Maximum tokens for reasoning (0-24576)reasoningEffort(string) - Simplified control: "none" (0 tokens), "low" (1K), "medium" (8K), "high" (24K)
safetySettings(array) - Controls content filteringsystemInstruction(string or object) - System instruction to guide model behaviorcachedContentName(string) - Identifier for cached content to use with this session
gemini_sendMessageDescription: Sends a message within an existing chat session.
Required Params:
sessionId(string),message(string)Optional Params:
generationConfig(object) - Controls generation parametersthinkingConfig(object) - Controls model reasoning processthinkingBudget(number) - Maximum tokens for reasoning (0-24576)reasoningEffort(string) - Simplified control: "none" (0 tokens), "low" (1K), "medium" (8K), "high" (24K)
safetySettings(array) - Controls content filteringtools(array) - Tool definitions including function declarationstoolConfig(object) - Configures tool behaviorcachedContentName(string) - Identifier for cached content to use with this message
gemini_sendFunctionResultDescription: Sends the result of a function execution back to a chat session.
Required Params:
sessionId(string),functionResponse(string) - The result of the function executionOptional Params:
functionCall(object) - Reference to the original function call
gemini_routeMessageDescription: Routes a message to the most appropriate model from a provided list based on message content. Returns both the model's response and which model was selected.
Required Params:
message(string) - The text message to be routed to the most appropriate modelmodels(array) - Array of model names to consider for routing (e.g., ['gemini-1.5-flash', 'gemini-1.5-pro']). The first model in the list will be used for routing decisions.
Optional Params:
routingPrompt(string) - Custom prompt to use for routing decisions. If not provided, a default routing prompt will be used.defaultModel(string) - Model to fall back to if routing fails. If not provided and routing fails, an error will be thrown.generationConfig(object) - Generation configuration settings to apply to the selected model's response.thinkingConfig(object) - Controls model reasoning processthinkingBudget(number) - Maximum tokens for reasoning (0-24576)reasoningEffort(string) - Simplified control: "none" (0 tokens), "low" (1K), "medium" (8K), "high" (24K)
safetySettings(array) - Safety settings to apply to both routing and final response.systemInstruction(string or object) - A system instruction to guide the model's behavior after routing.
Remote File Operations (Removed)
Note: Direct file upload operations are no longer supported by this server. The server now focuses exclusively on URL-based multimedia analysis for images and videos, and text-based content generation.
Alternative Approaches:
For Image Analysis: Use publicly accessible image URLs with
gemini_generateContentorgemini_url_analysistoolsFor Video Analysis: Use publicly accessible YouTube video URLs for content analysis
For Audio Content: Audio transcription via file uploads is not supported - consider using URL-based services that provide audio transcripts
For Document Analysis: Use URL-based document analysis or convert documents to publicly accessible formats
Caching (Google AI Studio Key Required)
gemini_createCacheDescription: Creates cached content for compatible models (e.g.,
gemini-1.5-flash).Required Params:
contents(array),model(string)Optional Params:
displayName(string) - Human-readable name for the cached contentsystemInstruction(string or object) - System instruction to apply to the cached contentttl(string - e.g., '3600s') - Time-to-live for the cached contenttools(array) - Tool definitions for use with the cached contenttoolConfig(object) - Configuration for the tools
gemini_listCachesDescription: Lists existing cached content.
Required Params: None
Optional Params:
pageSize(number),pageToken(string - Note:pageTokenmay not be reliably returned currently).
gemini_getCacheDescription: Retrieves metadata for specific cached content.
Required Params:
cacheName(string - e.g.,cachedContents/abc123xyz)
gemini_updateCacheDescription: Updates metadata and contents for cached content.
Required Params:
cacheName(string),contents(array)Optional Params:
displayName(string) - Updated display namesystemInstruction(string or object) - Updated system instructionttl(string) - Updated time-to-livetools(array) - Updated tool definitionstoolConfig(object) - Updated tool configuration
gemini_deleteCacheDescription: Deletes cached content.
Required Params:
cacheName(string - e.g.,cachedContents/abc123xyz)
Image Generation
gemini_generateImageDescription: Generates images from text prompts using available image generation models.
Required Params:
prompt(string - descriptive text prompt for image generation)Optional Params:
modelName(string - defaults to "imagen-3.1-generate-003" for high-quality dedicated image generation or use "gemini-2.0-flash-exp-image-generation" for Gemini models)resolution(string enum: "512x512", "1024x1024", "1536x1536")numberOfImages(number - 1-8, default: 1)safetySettings(array) - Controls content filtering for generated imagesnegativePrompt(string - features to avoid in the generated image)stylePreset(string enum: "photographic", "digital-art", "cinematic", "anime", "3d-render", "oil-painting", "watercolor", "pixel-art", "sketch", "comic-book", "neon", "fantasy")seed(number - integer value for reproducible generation)styleStrength(number - strength of style preset, 0.0-1.0)
Response: Returns an array of base64-encoded images with metadata including dimensions and MIME type.
Notes: Image generation uses significant resources, especially at higher resolutions. Consider using smaller resolutions for faster responses and less resource usage.
Audio Transcription (Removed)
Note: Audio transcription via direct file uploads is no longer supported by this server. The server focuses on URL-based multimedia analysis for images and videos.
Alternative Approaches for Audio Content:
YouTube Videos: Use the YouTube video analysis capabilities to analyze video content that includes audio
External Services: Use dedicated audio transcription services and analyze their output as text content
URL-Based Audio: If audio content is available via public URLs in supported formats, consider using external transcription services first, then analyze the resulting text
URL Content Analysis
gemini_url_analysisDescription: Advanced URL analysis tool that fetches content from web pages and performs specialized analysis tasks with comprehensive security and performance optimizations.
Required Params:
urls(array) - URLs to analyze (1-20 URLs supported)analysisType(string enum) - Type of analysis to perform:summary- Comprehensive content summarizationcomparison- Multi-URL content comparisonextraction- Structured information extractionqa- Question-based content analysissentiment- Emotional tone analysisfact-check- Credibility assessmentcontent-classification- Topic and type categorizationreadability- Accessibility and complexity analysisseo-analysis- Search optimization evaluation
Optional Params:
query(string) - Specific query or instruction for the analysisextractionSchema(object) - JSON schema for structured data extractionquestions(array) - List of specific questions to answer (for Q&A analysis)compareBy(array) - Specific aspects to compare when using comparison analysisoutputFormat(string enum: "text", "json", "markdown", "structured") - Desired output formatincludeMetadata(boolean) - Include URL metadata in the analysis (default: true)fetchOptions(object) - Advanced URL fetching options (same as urlContext fetchOptions)modelName(string) - Specific Gemini model to use (auto-selected if not specified)
Security Features: Multi-layer URL validation, domain restrictions, private network protection, and rate limiting
Performance Features: Intelligent caching, concurrent processing, and optimal model selection based on content complexity
MCP Client Tools
mcpConnectToServerDescription: Establishes a connection to an external MCP server and returns a connection ID.
Required Params:
serverId(string): A unique identifier for this server connection.connectionType(string enum: "sse" | "stdio"): The transport protocol to use.sseUrl(string, optional ifconnectionTypeis "stdio"): The URL for SSE connection.stdioCommand(string, optional ifconnectionTypeis "sse"): The command to run for stdio connection.stdioArgs(array of strings, optional): Arguments for the stdio command.stdioEnv(object, optional): Environment variables for the stdio command.
Important: This tool returns a
connectionIdthat must be used in subsequent calls tomcpListServerTools,mcpCallServerTool, andmcpDisconnectFromServer. ThisconnectionIdis generated internally and is different from theserverIdparameter.
mcpListServerToolsDescription: Lists available tools on a connected MCP server.
Required Params:
connectionId(string): The connection identifier returned bymcpConnectToServer.
mcpCallServerToolDescription: Calls a function on a connected MCP server.
Required Params:
connectionId(string): The connection identifier returned bymcpConnectToServer.toolName(string): The name of the tool to call on the remote server.toolArgs(object): The arguments to pass to the remote tool.
Optional Params:
outputToFile(string): If provided, the tool's output will be written to this file path. The path must be within one of the directories specified in theALLOWED_OUTPUT_PATHSenvironment variable.
mcpDisconnectFromServerDescription: Disconnects from an external MCP server.
Required Params:
connectionId(string): The connection identifier returned bymcpConnectToServer.
writeToFileDescription: Writes content directly to a file.
Required Params:
filePath(string): The absolute path of the file to write to. Must be within one of the directories specified in theALLOWED_OUTPUT_PATHSenvironment variable.content(string): The content to write to the file.
Optional Params:
overwrite(boolean, default: false): If true, overwrite the file if it already exists. Otherwise, an error will be thrown if the file exists.
Usage Examples
Here are examples of how an MCP client (like Claude) might call these tools using the use_mcp_tool format:
Example 1: Simple Content Generation (Using Default Model)
Example 2: Content Generation (Specifying Model & Config)
Example 2b: Content Generation with Thinking Budget Control
Example 2c: Content Generation with Simplified Reasoning Effort
Example 3: Starting and Continuing a Chat
Start Chat:
(Assume response contains
Send Message:
Example 4: Content Generation with System Instructions (Simplified Format)
Example 5: Content Generation with System Instructions (Object Format)
Example 6: Using Cached Content with System Instruction
Example 6: Generating an Image
Example 6b: Generating a High-Quality Image with Imagen 3.1
Example 6c: Using Advanced Style Options
Example 7: Message Routing Between Models
The response will be a JSON string containing both the text response and which model was chosen:
Example 8: Using URL Context with Content Generation
Example 9: Advanced URL Analysis
Example 10: Multi-URL Content Comparison
Example 11: URL Content with Security Restrictions
URL-Based Image Analysis Examples
These examples demonstrate how to analyze images from public URLs using Gemini's native image understanding capabilities. The server processes images by fetching them from URLs and converting them to the format required by the Gemini API. Note that this server does not support direct file uploads - all image analysis must be performed using publicly accessible image URLs.
Example 17: Basic Image Description and Analysis
Example 18: Object Detection and Identification
Example 19: Chart and Data Visualization Analysis
Example 20: Comparative Image Analysis
Example 21: Text Extraction from Images (OCR)
Example 22: Technical Diagram or Flowchart Analysis
Example 23: Image Analysis with Specific Questions
Example 24: Image Analysis with Security Restrictions
Important Notes for URL-Based Image Analysis:
Supported formats: PNG, JPEG, WebP, HEIC, HEIF (as per Gemini API specifications)
Image access: Images must be accessible via public URLs without authentication
Size considerations: Large images are automatically processed in sections by Gemini
Processing: The server fetches images from URLs and converts them to the format required by Gemini API
Security: The server applies restrictions to prevent access to private networks or malicious domains
Performance: Image analysis may take longer for high-resolution images due to processing complexity
Token usage: Image dimensions affect token consumption - larger images use more tokens
YouTube Video Analysis Examples
These examples demonstrate how to analyze YouTube videos using Gemini's video understanding capabilities. The server can process publicly accessible YouTube videos by providing their URLs. Note that only public YouTube videos are supported - private, unlisted, or region-restricted videos cannot be analyzed.
Example 25: Basic YouTube Video Analysis
Example 26: YouTube Video Content Extraction with Timestamps
Example 27: YouTube Video Analysis with Specific Questions
Example 28: Comparative Analysis of Multiple YouTube Videos
Example 29: YouTube Video Technical Analysis
Example 30: YouTube Video Sentiment and Style Analysis
Example 31: YouTube Video Educational Content Assessment
Example 32: YouTube Video with Domain Security Restrictions
Important Notes for YouTube Video Analysis:
Public videos only: Only publicly accessible YouTube videos can be analyzed
URL format: Use standard YouTube URLs (youtube.com/watch?v=VIDEO_ID or youtu.be/VIDEO_ID)
Processing time: Video analysis typically takes longer than text or image analysis
Content limitations: Very long videos may have content truncated or processed in segments
Metadata: Video metadata (title, description, duration) is included when
includeMetadata: trueLanguage support: Gemini can analyze videos in multiple languages
Content restrictions: The server applies the same security restrictions as other URL content
Token usage: Video analysis can consume significant tokens depending on video length and complexity
Supported Multimedia Analysis Use Cases
The MCP Gemini Server supports comprehensive multimedia analysis through URL-based processing, leveraging Google Gemini's advanced vision and video understanding capabilities. Below are the key use cases organized by content type:
Image Analysis Use Cases
Content Understanding:
Product Analysis: Analyze product images for features, design elements, and quality assessment
Document OCR: Extract and transcribe text from images of documents, receipts, and forms
Chart & Graph Analysis: Interpret data visualizations, extract key insights, and explain trends
Technical Diagrams: Understand architectural diagrams, flowcharts, and technical schematics
Medical Images: Analyze medical charts, X-rays, and diagnostic images (for educational purposes)
Art & Design: Analyze artistic compositions, color schemes, and design principles
Comparative Analysis:
Before/After Comparisons: Compare multiple images to identify changes and differences
Product Comparisons: Analyze multiple product images for feature comparison
A/B Testing: Evaluate design variations and visual differences
Security & Quality:
Content Moderation: Identify inappropriate or harmful visual content
Quality Assessment: Evaluate image quality, resolution, and technical aspects
Brand Compliance: Check images for brand guideline adherence
Video Analysis Use Cases
Educational Content:
Lecture Analysis: Extract key concepts, create summaries, and identify important timestamps
Tutorial Understanding: Break down step-by-step instructions and highlight key procedures
Training Materials: Analyze corporate training videos and extract learning objectives
Academic Research: Process research presentations and extract methodologies
Content Creation:
Video Summarization: Generate concise summaries of long-form video content
Transcript Generation: Create detailed transcripts with speaker identification
Content Categorization: Classify videos by topic, genre, or content type
Sentiment Analysis: Assess emotional tone and audience engagement indicators
Technical Analysis:
Software Demonstrations: Extract software features and usage instructions
Product Reviews: Analyze product demonstration videos and extract key insights
Troubleshooting Guides: Parse technical support videos for problem-solving steps
Code Reviews: Analyze programming tutorial videos and extract code examples
Business Intelligence:
Market Research: Analyze promotional videos and marketing content
Competitive Analysis: Study competitor video content and strategies
Customer Feedback: Process video testimonials and feedback sessions
Event Coverage: Analyze conference presentations and keynote speeches
Integration Capabilities
Multi-Modal Analysis:
Combine text prompts with image/video URLs for contextual analysis
Process multiple media types in single requests for comprehensive insights
Cross-reference visual content with textual instructions
Workflow Integration:
Chain multiple analysis operations for complex workflows
Export results to files for further processing
Integrate with external MCP servers for extended functionality
Security & Performance:
URL validation and security screening for safe content processing
Caching support for frequently analyzed content
Batch processing capabilities for multiple media items
Example 12: Connecting to an External MCP Server (SSE)
(Assume response contains a unique connection ID like:
Example 13: Calling a Tool on an External MCP Server and Writing Output to File
Important: The
Note: The outputToFile path must be within one of the directories specified in the ALLOWED_OUTPUT_PATHS environment variable. For example, if ALLOWED_OUTPUT_PATHS="/path/to/allowed/output,/another/allowed/path", then the file path must be a subdirectory of one of these paths.
Example 14: Writing Content Directly to a File
Note: Like with mcpCallServerTool, the filePath must be within one of the directories specified in the ALLOWED_OUTPUT_PATHS environment variable. This is a critical security feature to prevent unauthorized file writes.
mcp-gemini-server and Gemini SDK's MCP Function Calling
The official Google Gemini API documentation includes examples (such as for function calling with MCP structure) that demonstrate how you can use the client-side Gemini SDK (e.g., in Python or Node.js) to interact with the Gemini API. In such scenarios, particularly for function calling, the client SDK itself can be used to structure requests and handle responses in a manner that aligns with MCP principles.
The mcp-gemini-server project offers a complementary approach by providing a fully implemented, standalone MCP server. Instead of your client application directly using the Gemini SDK to format MCP-style messages for the Gemini API, your client application (which could be another LLM like Claude, a custom script, or any MCP-compatible system) would:
Connect to an instance of this
mcp-gemini-server.Call the pre-defined MCP tools exposed by this server, such as
gemini_functionCall,gemini_generateContent, etc.
This mcp-gemini-server then internally handles all the necessary interactions with the Google Gemini API, including structuring the requests, managing API keys, and processing responses, abstracting these details away from your MCP client.
Benefits of using mcp-gemini-server:
Abstraction & Simplicity: Client applications don't need to integrate the Gemini SDK directly or manage the specifics of its API for MCP-style interactions. They simply make standard MCP tool calls.
Centralized Configuration: API keys, default model choices, safety settings, and other configurations are managed centrally within the
mcp-gemini-server.Rich Toolset: Provides a broad set of pre-defined MCP tools for various Gemini features (text generation, chat, file handling, image generation, etc.), not just function calling.
Interoperability: Enables any MCP-compatible client to leverage Gemini's capabilities without needing native Gemini SDK support.
When to Choose Which Approach:
Direct SDK Usage (as in Google's MCP examples):
Suitable if you are building a client application (e.g., in Python or Node.js) and want fine-grained control over the Gemini API interaction directly within that client.
Useful if you prefer to manage the Gemini SDK dependencies and logic within your client application and are primarily focused on function calling structured in an MCP-like way.
Using
Ideal if you want to expose Gemini capabilities to an existing MCP-compatible ecosystem (e.g., another LLM, a workflow automation system).
Beneficial if you want to rapidly prototype or deploy Gemini features as tools without extensive client-side SDK integration.
Preferable if you need a wider range of Gemini features exposed as consistent MCP tools and want to centralize the Gemini API interaction point.
A Note on This Server's Own MCP Client Tools:
The mcp-gemini-server also includes tools like mcpConnectToServer, mcpListServerTools, and mcpCallServerTool. These tools allow this server to act as an MCP client to other external MCP servers. This is a distinct capability from how an MCP client would connect to mcp-gemini-server to utilize Gemini features.
Environment Variables
Required:
GOOGLE_GEMINI_API_KEY: Your Google Gemini API key (required)
Required for Production (unless NODE_ENV=test):
MCP_SERVER_HOST: Server host address (e.g., "localhost")MCP_SERVER_PORT: Port for network transports (e.g., "8080")MCP_CONNECTION_TOKEN: A strong, unique shared secret token that clients must provide when connecting to this server. This is NOT provided by Google or any external service - you must generate it yourself using a cryptographically secure method. See the installation instructions (step 4) for generation methods. This token must be identical on both the server and all connecting clients.
Optional - Gemini API Configuration:
GOOGLE_GEMINI_MODEL: Default model to use (e.g.,gemini-1.5-pro-latest,gemini-1.5-flash)GOOGLE_GEMINI_DEFAULT_THINKING_BUDGET: Default thinking budget in tokens (0-24576) for controlling model reasoning
Optional - URL Context Configuration:
GOOGLE_GEMINI_ENABLE_URL_CONTEXT: Enable URL context features (options:true,false; default:false)GOOGLE_GEMINI_URL_MAX_COUNT: Maximum URLs per request (default:20)GOOGLE_GEMINI_URL_MAX_CONTENT_KB: Maximum content size per URL in KB (default:100)GOOGLE_GEMINI_URL_FETCH_TIMEOUT_MS: Fetch timeout per URL in milliseconds (default:10000)GOOGLE_GEMINI_URL_ALLOWED_DOMAINS: Comma-separated list or JSON array of allowed domains (default:*for all domains)GOOGLE_GEMINI_URL_BLOCKLIST: Comma-separated list or JSON array of blocked domains (default: empty)GOOGLE_GEMINI_URL_CONVERT_TO_MARKDOWN: Convert HTML content to markdown (options:true,false; default:true)GOOGLE_GEMINI_URL_INCLUDE_METADATA: Include URL metadata in context (options:true,false; default:true)GOOGLE_GEMINI_URL_ENABLE_CACHING: Enable URL content caching (options:true,false; default:true)GOOGLE_GEMINI_URL_USER_AGENT: Custom User-Agent header for URL requests (default:MCP-Gemini-Server/1.0)
Optional - Security Configuration:
ALLOWED_OUTPUT_PATHS: A comma-separated list of absolute paths to directories where tools likemcpCallServerTool(with outputToFile parameter) andwriteToFileToolare allowed to write files. Critical security feature to prevent unauthorized file writes. If not set, file output will be disabled for these tools.
Optional - Server Configuration:
MCP_CLIENT_ID: Default client ID used when this server acts as a client to other MCP servers (defaults to "gemini-sdk-client")MCP_TRANSPORT: Transport to use for MCP server (options:stdio,sse,streamable,http; default:stdio)IMPORTANT: SSE (Server-Sent Events) is NOT deprecated and remains a critical component of the MCP protocol
SSE is particularly valuable for bidirectional communication, enabling features like dynamic tool updates and sampling
Each transport type has specific valid use cases within the MCP ecosystem
MCP_LOG_LEVEL: Log level for MCP operations (options:debug,info,warn,error; default:info)MCP_ENABLE_STREAMING: Enable SSE streaming for HTTP transport (options:true,false; default:false)MCP_SESSION_TIMEOUT: Session timeout in seconds for HTTP transport (default:3600= 1 hour)SESSION_STORE_TYPE: Session storage backend (memoryorsqlite; default:memory)SQLITE_DB_PATH: Path to SQLite database file when using sqlite store (default:./data/sessions.db)
Optional - GitHub Integration:
GITHUB_API_TOKEN: Personal Access Token for GitHub API access (required for GitHub code review features). For public repos, token needs 'public_repo' and 'read:user' scopes. For private repos, token needs 'repo' scope.
Optional - Legacy Server Configuration (Deprecated):
MCP_TRANSPORT_TYPE: Deprecated - UseMCP_TRANSPORTinsteadMCP_WS_PORT: Deprecated - UseMCP_SERVER_PORTinsteadENABLE_HEALTH_CHECK: Enable health check server (options:true,false; default:true)HEALTH_CHECK_PORT: Port for health check HTTP server (default:3000)
You can create a .env file in the root directory with these variables:
Security Considerations
This server implements several security measures to protect against common vulnerabilities. Understanding these security features is critical when deploying in production environments.
File System Security
Path Validation and Isolation
ALLOWED_OUTPUT_PATHS: Critical security feature that restricts where file writing tools can write files
Security Principle: Files can only be created, read, or modified within explicitly allowed directories
Production Requirement: Always use absolute paths to prevent potential directory traversal attacks
Path Traversal Protection
The
FileSecurityServiceimplements robust path traversal protection by:Fully resolving paths to their absolute form
Normalizing paths to handle ".." and "." segments properly
Validating that normalized paths stay within allowed directories
Checking both string-based prefixes and relative path calculations for redundant security
Symlink Security
Symbolic links are fully resolved and checked against allowed directories
Both the symlink itself and its target are validated
Parent directory symlinks are iteratively checked to prevent circumvention
Multi-level symlink chains are fully resolved before validation
Authentication & Authorization
Connection Tokens
MCP_CONNECTION_TOKENprovides basic authentication for clients connecting to this serverShould be treated as a secret and use a strong, unique value in production
API Key Security
GOOGLE_GEMINI_API_KEYgrants access to Google Gemini API servicesMust be kept secure and never exposed in client-side code or logs
Use environment variables or secure secret management systems to inject this value
URL Context Security
Multi-Layer URL Validation
Protocol Validation: Only HTTP/HTTPS protocols are allowed
Private Network Protection: Blocks access to localhost, private IP ranges, and internal domains
Domain Control: Configurable allowlist/blocklist with wildcard support
Suspicious Pattern Detection: Identifies potential path traversal, dangerous characters, and malicious patterns
IDN Homograph Attack Prevention: Detects potentially confusing Unicode domain names
Rate Limiting and Resource Protection
Per-domain rate limiting: Default 10 requests per minute per domain
Content size limits: Configurable maximum content size per URL (default 100KB)
Request timeout controls: Prevents hanging requests (default 10 seconds)
Concurrent request limits: Controlled batch processing to prevent overload
Content Security
Content type validation: Only processes text-based content types
HTML sanitization: Removes script tags, style blocks, and dangerous content
Metadata extraction: Safely parses HTML metadata without executing code
Memory protection: Content truncation prevents memory exhaustion attacks
Network Security
Transport Options
stdio: Provides process isolation when used as a spawned child process
SSE/HTTP: Ensure proper network-level protection when exposing over networks
Port Configuration
Configure firewall rules appropriately when exposing server ports
Consider reverse proxies with TLS termination for production deployments
Production Deployment Recommendations
File Paths
Always use absolute paths for
ALLOWED_OUTPUT_PATHSUse paths outside the application directory to prevent source code modification
Restrict to specific, limited-purpose directories with appropriate permissions
NEVER include sensitive system directories like "/", "/etc", "/usr", "/bin", or "/home"
Process Isolation
Run the server with restricted user permissions
Consider containerization (Docker) for additional isolation
Secrets Management
Use a secure secrets management solution instead of .env files in production
Rotate API keys and connection tokens regularly
URL Context Security
Enable URL context only when needed: Set
GOOGLE_GEMINI_ENABLE_URL_CONTEXT=falseif not requiredUse restrictive domain allowlists: Avoid
GOOGLE_GEMINI_URL_ALLOWED_DOMAINS=*in productionConfigure comprehensive blocklists: Add known malicious domains to
GOOGLE_GEMINI_URL_BLOCKLISTSet conservative resource limits: Use appropriate values for
GOOGLE_GEMINI_URL_MAX_CONTENT_KBandGOOGLE_GEMINI_URL_MAX_COUNTMonitor URL access patterns: Review logs for suspicious URL access attempts
Consider network-level protection: Use firewalls or proxies to add additional URL filtering
Error Handling
The server provides enhanced error handling using the MCP standard McpError type when tool execution fails. This object contains:
code: AnErrorCodeenum value indicating the type of error:InvalidParams: Parameter validation errors (wrong type, missing required field, etc.)InvalidRequest: General request errors, including safety blocks and not found resourcesPermissionDenied: Authentication or authorization failuresResourceExhausted: Rate limits, quotas, or resource capacity issuesFailedPrecondition: Operations that require conditions that aren't metInternalError: Unexpected server or API errors
message: A human-readable description of the error with specific details.details: (Optional) An object with more specific information from the Gemini SDK error.
Implementation Details
The server uses a multi-layered approach to error handling:
Validation Layer: Zod schemas validate all parameters at both the tool level (MCP request) and service layer (before API calls).
Error Classification: A detailed error mapping system categorizes errors from the Google GenAI SDK into specific error types:
GeminiValidationError: Parameter validation failuresGeminiAuthError: Authentication issuesGeminiQuotaError: Rate limiting and quota exhaustionGeminiContentFilterError: Content safety filteringGeminiNetworkError: Connection and timeout issuesGeminiModelError: Model-specific problems
Retry Mechanism: Automatic retry with exponential backoff for transient errors:
Network issues, timeouts, and rate limit errors are automatically retried
Configurable retry parameters (attempts, delay, backoff factor)
Jitter randomization to prevent synchronized retry attempts
Detailed logging of retry attempts for debugging
Common Error Scenarios:
Authentication Failures:
PermissionDenied- Invalid API key, expired credentials, or unauthorized access.Parameter Validation:
InvalidParams- Missing required fields, wrong data types, invalid values.Safety Blocks:
InvalidRequest- Content blocked by safety filters with details indicatingSAFETYas the block reason.File/Cache Not Found:
InvalidRequest- Resource not found, with details about the missing resource.Rate Limits:
ResourceExhausted- API quota exceeded or rate limits hit, with details about limits.File API Unavailable:
FailedPrecondition- When attempting File API operations without a valid Google AI Studio key.Path Traversal Security:
InvalidParams- Attempts to access audio files outside the allowed directory with details about the security validation failure.Image/Audio Processing Errors:
InvalidParams- For format issues, size limitations, or invalid inputsInternalError- For processing failures during analysisResourceExhausted- For resource-intensive operations exceeding limits
The server includes additional context in error messages to help with troubleshooting, including session IDs for chat-related errors and specific validation details for parameter errors.
Check the message and details fields of the returned McpError for specific troubleshooting information.
Development and Testing
This server includes a comprehensive test suite to ensure functionality and compatibility with the Gemini API. The tests are organized into unit tests (for individual components) and integration tests (for end-to-end functionality).
Test Structure
Unit Tests: Located in
tests/unit/- Test individual components in isolation with mocked dependenciesIntegration Tests: Located in
tests/integration/- Test end-to-end functionality with real server interactionTest Utilities: Located in
tests/utils/- Helper functions and fixtures for testing
Running Tests
Testing Approach
Service Mocking: The tests use a combination of direct method replacement and mock interfaces to simulate the Gemini API response. This is particularly important for the
@google/genaiSDK (v0.10.0) which has a complex object structure.Environmental Variables: Tests automatically check for required environment variables and will skip tests that require API keys if they're not available. This allows core functionality to be tested without credentials.
Test Server: Integration tests use a test server fixture that creates an isolated HTTP server instance with the MCP handler configured for testing.
RetryService: The retry mechanism is extensively tested to ensure proper handling of transient errors with exponential backoff, jitter, and configurable retry parameters.
Image Generation: Tests specifically address the complex interactions with the Gemini API for image generation, supporting both Gemini models and the dedicated Imagen 3.1 model.
Test Environment Setup
For running tests that require API access, create a .env.test file in the project root with the following variables:
The test suite will automatically detect available environment variables and skip tests that require missing configuration.
Contributing
We welcome contributions to improve the MCP Gemini Server! This section provides guidelines for contributing to the project.
Development Environment Setup
Fork and Clone the Repository
git clone https://github.com/yourusername/mcp-gemini-server.git cd mcp-gemini-serverInstall Dependencies
npm installSet Up Environment Variables Create a
.envfile in the project root with the necessary variables as described in the Environment Variables section.Build and Run
npm run build npm run dev
Development Process
Create a Feature Branch
git checkout -b feature/your-feature-nameMake Your Changes Implement your feature or fix, following the code style guidelines.
Write Tests Add tests for your changes to ensure functionality and prevent regressions.
Run Tests and Linting
npm run test npm run lint npm run formatCommit Your Changes Use clear, descriptive commit messages that explain the purpose of your changes.
Testing Guidelines
Write unit tests for all new functionality
Update existing tests when modifying functionality
Ensure all tests pass before submitting a pull request
Include both positive and negative test cases
Mock external dependencies to ensure tests can run without external services
Pull Request Process
Update Documentation Update the README.md and other documentation to reflect your changes.
Submit a Pull Request
Provide a clear description of the changes
Link to any related issues
Explain how to test the changes
Ensure all CI checks pass
Code Review
Address any feedback from reviewers
Make requested changes and update the PR
Coding Standards
Follow the existing code style (PascalCase for classes/interfaces/types, camelCase for functions/variables)
Use strong typing with TypeScript interfaces
Document public APIs with JSDoc comments
Handle errors properly by extending base error classes
Follow the service-based architecture with dependency injection
Use Zod for schema validation
Format code according to the project's ESLint and Prettier configuration
Code Review Tools
The MCP Gemini Server provides powerful code review capabilities leveraging Gemini's models to analyze git diffs and GitHub repositories. These tools help identify potential issues, suggest improvements, and provide comprehensive feedback on code changes.
Local Git Diff Review
Review local git changes directly from your command line:
The CLI script supports various options:
--focus=FOCUS: Focus of the review (security, performance, architecture, bugs, general)--model=MODEL: Model to use (defaults to gemini-flash-2.0 for cost efficiency)--reasoning=LEVEL: Reasoning effort (none, low, medium, high)--exclude=PATTERN: Files to exclude using glob patterns
GitHub Repository Review
Review GitHub repositories, branches, and pull requests using the following tools:
GitHub PR Review Tool: Analyzes pull requests for issues and improvements
GitHub Repository Review Tool: Analyzes entire repositories or branches
Cost Optimization
By default, code review tools use the more cost-efficient gemini-flash-2.0 model, which offers a good balance between cost and capability for most code review tasks. For particularly complex code bases or when higher reasoning depth is needed, you can specify more powerful models:
Running Tests
Tests for the GitHub code review functionality can also use the cheaper model:
Server Features
Health Check Endpoint
The server provides a built-in health check HTTP endpoint that can be used for monitoring and status checks. This is separate from the MCP server transport and runs as a lightweight HTTP server.
When enabled, you can access the health check at:
The health check endpoint returns a JSON response with the following information:
You can check the health endpoint using curl:
You can configure the health check using these environment variables:
ENABLE_HEALTH_CHECK: Set to "false" to disable the health check server (default: "true")HEALTH_CHECK_PORT: Port number for the health check server (default: 3000)
Session Persistence
The server supports persistent session storage for HTTP/SSE transports, allowing sessions to survive server restarts and enabling horizontal scaling.
Storage Backends
In-Memory Store (Default)
Sessions stored in server memory
Fast performance for development
Sessions lost on server restart
No external dependencies
SQLite Store
Sessions persisted to local SQLite database
Survives server restarts
Automatic cleanup of expired sessions
Good for single-instance production deployments
Configuration
Enable SQLite session persistence:
The SQLite database file and directory will be created automatically on first use. The database includes:
Automatic indexing for performance
Built-in cleanup of expired sessions
ACID compliance for data integrity
Session Lifecycle
Sessions are created when clients connect via HTTP/SSE transport
Each session has a configurable timeout (default: 1 hour)
Session expiration is extended on each activity
Expired sessions are automatically cleaned up every minute
Graceful Shutdown
The server implements graceful shutdown handling for SIGTERM and SIGINT signals. When the server receives a shutdown signal:
It attempts to properly disconnect the MCP server transport
It closes the health check server if running
It logs the shutdown status
It exits with the appropriate exit code (0 for successful shutdown, 1 if errors occurred)
This ensures clean termination when the server is run in containerized environments or when stopped manually.
Known Issues
Pagination Issues:
gemini_listCachesmay not reliably returnnextPageTokendue to limitations in iterating the SDK's Pager object. A workaround is implemented but has limited reliability.Path Requirements: Audio transcription operations require absolute paths when run from the server environment. Relative paths are not supported.
File Size Limitations: Audio files for transcription are limited to 20MB (original file size, before base64 encoding). The server reads the file and converts it to base64 internally. Larger files will be rejected with an error message.
API Compatibility: Caching API is not supported with Vertex AI credentials, only Google AI Studio API keys.
Model Support: This server is primarily tested and optimized for the latest Gemini 1.5 and 2.5 models. While other models should work, these models are the primary focus for testing and feature compatibility.
TypeScript Build Issues: The TypeScript build may show errors primarily in test files. These are type compatibility issues that don't affect the runtime functionality. The server itself will function properly despite these build warnings.
Resource Usage:
Image processing requires significant resource usage, especially for large resolution images. Consider using smaller resolutions (512x512) for faster responses.
Generating multiple images simultaneously increases resource usage proportionally.
Audio transcription is limited to files under 20MB (original file size). The server reads files from disk and handles base64 conversion internally. Processing may take significant time and resources depending on file size and audio complexity.
Content Handling:
Base64-encoded images are streamed in chunks to handle large file sizes efficiently.
Visual content understanding may perform differently across various types of visual content (charts vs. diagrams vs. documents).
Audio transcription accuracy depends on audio quality, number of speakers, and background noise.
URL Context Features:
URL context is disabled by default and must be explicitly enabled via
GOOGLE_GEMINI_ENABLE_URL_CONTEXT=trueJavaScript-rendered content is not supported - only static HTML content is processed
Some websites may block automated access or require authentication that is not currently supported
Content extraction quality may vary depending on website structure and formatting
Rate limiting per domain (10 requests/minute by default) may affect bulk processing scenarios