MCP ビデオ認識サーバー

Google の Gemini AI を使用して画像、音声、ビデオ認識用のツールを提供する MCP (Model Context Protocol) サーバー。

特徴

画像認識: Google Gemini AI を使用して画像を分析および説明する
音声認識: Google Gemini AI を使用して音声を分析し、書き起こします
動画認識: Google Gemini AI を使用して動画を分析し、説明する

前提条件

Node.js 18以上
Google Gemini APIキー

インストール

手動インストール

リポジトリをクローンします。
git clone https://github.com/yourusername/mcp-video-recognition.git cd mcp-video-recognition
依存関係をインストールします:
npm install
プロジェクトをビルドします。
npm run build

FLUJOへのインストール

サーバーの追加をクリック
Github URL をコピーして FLUJO に貼り付けます
「解析」、「複製」、「インストール」、「ビルド」、「保存」をクリックします。

設定ファイルによるインストール

構成ファイルを介してこの MCP サーバーを Cline または他の MCP クライアントと統合するには、次の手順を実行します。

Cline 設定を開きます:
- VS Codeで、「ファイル」->「設定」->「設定」に移動します。
- 「Cline MCP 設定」を検索
- 「settings.jsonで編集」をクリック
mcpServersオブジェクトにサーバー構成を追加します。
{ "mcpServers": { "video-recognition": { "command": "node", "args": [ "/path/to/mcp-video-recognition/dist/index.js" ], "disabled": false, "autoApprove": [] } } }
/path/to/mcp-video-recognition/dist/index.jsを、プロジェクトディレクトリ内のindex.jsファイルへの実際のパスに置き換えてください。Windows の場合は、パスにスラッシュ (/) または二重のバックスラッシュ (\\) を使用してください。
設定ファイルを保存します。Cline は自動的にサーバーに接続します。

構成

サーバーは環境変数を使用して構成されます。

GOOGLE_API_KEY (必須): Google Gemini API キー
TRANSPORT_TYPE : 使用するトランスポートタイプ ( stdioまたはsse 、デフォルトはstdio )
PORT : SSEトランスポートのポート番号（デフォルトは3000）
LOG_LEVEL : ログレベル ( verbose 、 debug 、 info 、 warn 、 error 、デフォルトはinfo )

使用法

サーバーの起動

stdioトランスポートあり（デフォルト）

GOOGLE_API_KEY=your_api_key npm start

SSEトランスポート

GOOGLE_API_KEY=your_api_key TRANSPORT_TYPE=sse PORT=3000 npm start

ツールの使用

サーバーは、MCP クライアントから呼び出すことができる 3 つのツールを提供します。

画像認識

{
  "name": "image_recognition",
  "arguments": {
    "filepath": "/path/to/image.jpg",
    "prompt": "Describe this image in detail",
    "modelname": "gemini-2.0-flash"
  }
}

音声認識

{
  "name": "audio_recognition",
  "arguments": {
    "filepath": "/path/to/audio.mp3",
    "prompt": "Transcribe this audio",
    "modelname": "gemini-2.0-flash"
  }
}

ビデオ認識

{
  "name": "video_recognition",
  "arguments": {
    "filepath": "/path/to/video.mp4",
    "prompt": "Describe what happens in this video",
    "modelname": "gemini-2.0-flash"
  }
}

ツールパラメータ

すべてのツールは次のパラメータを受け入れます。

filepath (必須): 分析するメディアファイルへのパス
prompt （オプション）：認識のためのカスタムプロンプト（デフォルトは「このコンテンツを説明してください」）
modelname (オプション): 認識に使用する Gemini モデル (デフォルトは "gemini-2.0-flash")

発達

開発モードで実行

GOOGLE_API_KEY=your_api_key npm run dev

プロジェクト構造

src/index.ts : エントリポイント
src/server.ts : MCP サーバーの実装
src/tools/ : ツールの実装
src/services/ : サービス実装 (Gemini API)
src/types/ : 型定義
src/utils/ : ユーティリティ関数

ライセンス

マサチューセッツ工科大学

Install Server

HTTP connection URL

security – no known vulnerabilities

license - permissive license

quality - confirmed to work

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Tools

モデルコンテキストプロトコルを通じて Google の Gemini AI を使用して、画像、音声、ビデオを認識するためのツールを提供します。

Related MCP Servers

Image Toolkit MCP Server
Kira-Pgr
-
security
A
license
-
quality
A server that provides AI-powered image generation, modification, and processing capabilities through the Model Context Protocol, leveraging Google Gemini models and other image services.
Last updated -
11
Python
MIT License
Gemini MCP Image Generation Server
sanxfxteam
A
security
A
license
A
quality
A Model Context Protocol server that provides image generation capabilities using Google's Gemini 2 API, allowing users to generate multiple images with customizable parameters like prompts, aspect ratios, and person generation settings.
Last updated -
1
3
JavaScript
MIT License
MCP Gemini API Server
techkwon
A
security
F
license
A
quality
A server that provides access to Google Gemini AI capabilities including text generation, image analysis, YouTube video analysis, and web search functionality through the MCP protocol.
Last updated -
6
18
4
TypeScript
MCP Gemini CLI
kazuph
-
security
F
license
-
quality
A server that allows interaction with Google's Gemini AI through the Gemini CLI tool using the Model Context Protocol, providing a standardized interface for querying Gemini with various options and configurations.
Last updated -
JavaScript

View all related MCP servers

MCP Video Recognition Server