MCP视频识别服务器

MCP（模型上下文协议）服务器，使用 Google 的 Gemini AI 提供图像、音频和视频识别工具。

特征

图像识别：使用 Google Gemini AI 分析和描述图像
音频识别：使用 Google Gemini AI 分析和转录音频
视频识别：使用 Google Gemini AI 分析和描述视频

先决条件

Node.js 18 或更高版本
Google Gemini API 密钥

安装

手动安装

克隆存储库：
git clone https://github.com/yourusername/mcp-video-recognition.git cd mcp-video-recognition
安装依赖项：
npm install
构建项目：
npm run build

在FLUJO中安装

单击“添加服务器”
将 Github URL 复制并粘贴到 FLUJO 中
单击解析、克隆、安装、构建和保存。

通过配置文件安装

要通过配置文件将此 MCP 服务器与 Cline 或其他 MCP 客户端集成：

打开您的 Cline 设置：
- 在 VS Code 中，转到“文件”->“首选项”->“设置”
- 搜索“Cline MCP 设置”
- 点击“在settings.json中编辑”
将服务器配置添加到mcpServers对象：
{ "mcpServers": { "video-recognition": { "command": "node", "args": [ "/path/to/mcp-video-recognition/dist/index.js" ], "disabled": false, "autoApprove": [] } } }
将/path/to/mcp-video-recognition/dist/index.js替换为项目目录中index.js文件的实际路径。在 Windows 系统中，请使用正斜杠 (/) 或双反斜杠 (\\) 作为路径。
保存设置文件。Cline 应该会自动连接到服务器。

配置

使用环境变量配置服务器：

GOOGLE_API_KEY （必��）：您的 Google Gemini API 密钥
TRANSPORT_TYPE ：要使用的传输类型（ stdio或sse ，默认为stdio ）
PORT ：SSE 传输的端口号（默认为 3000）
LOG_LEVEL ：日志级别（ verbose 、 debug 、 info 、 warn 、 error ，默认为info ）

用法

启动服务器

使用 stdio 传输（默认）

GOOGLE_API_KEY=your_api_key npm start

使用 SSE Transport

GOOGLE_API_KEY=your_api_key TRANSPORT_TYPE=sse PORT=3000 npm start

使用工具

服务器提供了三个可供 MCP 客户端调用的工具：

图像识别

{
  "name": "image_recognition",
  "arguments": {
    "filepath": "/path/to/image.jpg",
    "prompt": "Describe this image in detail",
    "modelname": "gemini-2.0-flash"
  }
}

音频识别

{
  "name": "audio_recognition",
  "arguments": {
    "filepath": "/path/to/audio.mp3",
    "prompt": "Transcribe this audio",
    "modelname": "gemini-2.0-flash"
  }
}

视频识别

{
  "name": "video_recognition",
  "arguments": {
    "filepath": "/path/to/video.mp4",
    "prompt": "Describe what happens in this video",
    "modelname": "gemini-2.0-flash"
  }
}

工具参数

所有工具均接受以下参数：

filepath （必需）：要分析的媒体文件的路径
prompt （可选）：识别的自定义提示（默认为“描述此内容”）
modelname （可选）：用于识别的 Gemini 模型（默认为“gemini-2.0-flash”）

发展

以开发模式运行

GOOGLE_API_KEY=your_api_key npm run dev

项目结构

src/index.ts ：入口点
src/server.ts ：MCP 服务器实现
src/tools/ ：工具实现
src/services/ ：服务实现（Gemini API）
src/types/ ：类型定义
src/utils/ ：实用程序函数

执照

麻省理工学院

Install Server

HTTP connection URL

security – no known vulnerabilities

license - permissive license

quality - confirmed to work

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Tools

通过模型上下文协议使用 Google 的 Gemini AI 提供图像、音频和视频识别工具。

Related MCP Servers

Image Toolkit MCP Server
Kira-Pgr
-
security
A
license
-
quality
A server that provides AI-powered image generation, modification, and processing capabilities through the Model Context Protocol, leveraging Google Gemini models and other image services.
Last updated -
11
Python
MIT License
Gemini MCP Image Generation Server
sanxfxteam
A
security
A
license
A
quality
A Model Context Protocol server that provides image generation capabilities using Google's Gemini 2 API, allowing users to generate multiple images with customizable parameters like prompts, aspect ratios, and person generation settings.
Last updated -
1
3
JavaScript
MIT License
MCP Gemini API Server
techkwon
A
security
F
license
A
quality
A server that provides access to Google Gemini AI capabilities including text generation, image analysis, YouTube video analysis, and web search functionality through the MCP protocol.
Last updated -
6
18
4
TypeScript
MCP Gemini CLI
kazuph
-
security
F
license
-
quality
A server that allows interaction with Google's Gemini AI through the Gemini CLI tool using the Model Context Protocol, providing a standardized interface for querying Gemini with various options and configurations.
Last updated -
JavaScript

View all related MCP servers

MCP Video Recognition Server