MCP Video Recognition Server
An MCP (Model Context Protocol) server that provides tools for image, audio, and video recognition using Google's Gemini AI.
Features
- Image Recognition: Analyze and describe images using Google Gemini AI
- Audio Recognition: Analyze and transcribe audio using Google Gemini AI
- Video Recognition: Analyze and describe videos using Google Gemini AI
Prerequisites
- Node.js 18 or higher
- Google Gemini API key
Installation
Manual Installation
- Clone the repository:Copy
- Install dependencies:Copy
- Build the project:Copy
Installing in FLUJO
- Click Add Server
- Copy & Paste Github URL into FLUJO
- Click Parse, Clone, Install, Build and Save.
Installing via Configuration Files
To integrate this MCP server with Cline or other MCP clients via configuration files:
- Open your Cline settings:
- In VS Code, go to File -> Preferences -> Settings
- Search for "Cline MCP Settings"
- Click "Edit in settings.json"
- Add the server configuration to the
mcpServers
object:Copy - Replace
/path/to/mcp-video-recognition/dist/index.js
with the actual path to theindex.js
file in your project directory. Use forward slashes (/) or double backslashes (\\) for the path on Windows. - Save the settings file. Cline should automatically connect to the server.
Configuration
The server is configured using environment variables:
GOOGLE_API_KEY
(required): Your Google Gemini API keyTRANSPORT_TYPE
: Transport type to use (stdio
orsse
, defaults tostdio
)PORT
: Port number for SSE transport (defaults to 3000)LOG_LEVEL
: Logging level (verbose
,debug
,info
,warn
,error
, defaults toinfo
)
Usage
Starting the Server
With stdio Transport (Default)
Copy
With SSE Transport
Copy
Using the Tools
The server provides three tools that can be called by MCP clients:
Image Recognition
Copy
Audio Recognition
Copy
Video Recognition
Copy
Tool Parameters
All tools accept the following parameters:
filepath
(required): Path to the media file to analyzeprompt
(optional): Custom prompt for the recognition (defaults to "Describe this content")modelname
(optional): Gemini model to use for recognition (defaults to "gemini-2.0-flash")
Development
Running in Development Mode
Copy
Project Structure
src/index.ts
: Entry pointsrc/server.ts
: MCP server implementationsrc/tools/
: Tool implementationssrc/services/
: Service implementations (Gemini API)src/types/
: Type definitionssrc/utils/
: Utility functions
License
MIT
This server cannot be installed
Provides tools for image, audio, and video recognition using Google's Gemini AI through the Model Context Protocol.