Integrations
Provides tools for image, audio, and video recognition using Google's Gemini AI models, allowing analysis and description of images, transcription of audio, and description of video content.
MCP Video Recognition Server
An MCP (Model Context Protocol) server that provides tools for image, audio, and video recognition using Google's Gemini AI.
Features
- Image Recognition: Analyze and describe images using Google Gemini AI
- Audio Recognition: Analyze and transcribe audio using Google Gemini AI
- Video Recognition: Analyze and describe videos using Google Gemini AI
Prerequisites
- Node.js 18 or higher
- Google Gemini API key
Installation
Manual Installation
- Clone the repository:Copy
- Install dependencies:Copy
- Build the project:Copy
Installing in FLUJO
- Click Add Server
- Copy & Paste Github URL into FLUJO
- Click Parse, Clone, Install, Build and Save.
Installing via Configuration Files
To integrate this MCP server with Cline or other MCP clients via configuration files:
- Open your Cline settings:
- In VS Code, go to File -> Preferences -> Settings
- Search for "Cline MCP Settings"
- Click "Edit in settings.json"
- Add the server configuration to the
mcpServers
object:Copy - Replace
/path/to/mcp-video-recognition/dist/index.js
with the actual path to theindex.js
file in your project directory. Use forward slashes (/) or double backslashes (\\) for the path on Windows. - Save the settings file. Cline should automatically connect to the server.
Configuration
The server is configured using environment variables:
GOOGLE_API_KEY
(required): Your Google Gemini API keyTRANSPORT_TYPE
: Transport type to use (stdio
orsse
, defaults tostdio
)PORT
: Port number for SSE transport (defaults to 3000)LOG_LEVEL
: Logging level (verbose
,debug
,info
,warn
,error
, defaults toinfo
)
Usage
Starting the Server
With stdio Transport (Default)
With SSE Transport
Using the Tools
The server provides three tools that can be called by MCP clients:
Image Recognition
Audio Recognition
Video Recognition
Tool Parameters
All tools accept the following parameters:
filepath
(required): Path to the media file to analyzeprompt
(optional): Custom prompt for the recognition (defaults to "Describe this content")modelname
(optional): Gemini model to use for recognition (defaults to "gemini-2.0-flash")
Development
Running in Development Mode
Project Structure
src/index.ts
: Entry pointsrc/server.ts
: MCP server implementationsrc/tools/
: Tool implementationssrc/services/
: Service implementations (Gemini API)src/types/
: Type definitionssrc/utils/
: Utility functions
License
MIT
You must be authenticated.
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Provides tools for image, audio, and video recognition using Google's Gemini AI through the Model Context Protocol.
Related MCP Servers
- -securityAlicense-qualityA server that provides AI-powered image generation, modification, and processing capabilities through the Model Context Protocol, leveraging Google Gemini models and other image services.Last updated -6PythonMIT License
- AsecurityAlicenseAqualityAllows AI assistants to generate and transform high-quality images from text prompts using Google's Gemini model via the MCP protocol.Last updated -35PythonMIT License
- AsecurityAlicenseAqualityA Model Context Protocol server that provides image generation capabilities using Google's Gemini 2 API, allowing users to generate multiple images with customizable parameters like prompts, aspect ratios, and person generation settings.Last updated -1JavaScriptMIT License
- -securityAlicense-qualityA server that enables Claude Desktop to generate images using Google's Gemini AI models through the Model Context Protocol (MCP).Last updated -1JavaScriptMIT License