MCP 비디오 인식 서버

Google의 Gemini AI를 사용하여 이미지, 오디오, 비디오 인식을 위한 도구를 제공하는 MCP(Model Context Protocol) 서버입니다.

특징

이미지 인식 : Google Gemini AI를 사용하여 이미지를 분석하고 설명합니다.
오디오 인식 : Google Gemini AI를 사용하여 오디오 분석 및 필사
비디오 인식 : Google Gemini AI를 사용하여 비디오를 분석하고 설명합니다.

필수 조건

Node.js 18 이상
Google Gemini API 키

설치

수동 설치

저장소를 복제합니다.지엑스피1
종속성 설치:
npm install
프로젝트를 빌드하세요:
npm run build

FLUJO 에 설치하기

서버 추가를 클릭하세요
Github URL을 복사하여 FLUJO에 붙여넣기
분석, 복제, 설치, 빌드 및 저장을 클릭합니다.

구성 파일을 통한 설치

구성 파일을 통해 이 MCP 서버를 Cline이나 다른 MCP 클라이언트와 통합하려면:

Cline 설정을 엽니다.
- VS Code에서 파일 -> 환경 설정 -> 설정으로 이동합니다.
- "Cline MCP 설정"을 검색하세요
- "settings.json에서 편집"을 클릭하세요.
mcpServers 개체에 서버 구성을 추가합니다.
{ "mcpServers": { "video-recognition": { "command": "node", "args": [ "/path/to/mcp-video-recognition/dist/index.js" ], "disabled": false, "autoApprove": [] } } }
/path/to/mcp-video-recognition/dist/index.js 프로젝트 디렉터리에 있는 index.js 파일의 실제 경로로 바꾸세요. Windows에서는 경로에 슬래시(/) 또는 이중 백슬래시(\\)를 사용하세요.
설정 파일을 저장하세요. Cline이 자동으로 서버에 연결될 것입니다.

구성

서버는 환경 변수를 사용하여 구성됩니다.

GOOGLE_API_KEY (필수): Google Gemini API 키
TRANSPORT_TYPE : 사용할 전송 유형( stdio 또는 sse , 기본값은 stdio )
PORT : SSE 전송을 위한 포트 번호(기본값은 3000)
LOG_LEVEL : 로깅 수준( verbose , debug , info , warn , error , 기본값은 info )

용법

서버 시작

stdio 전송(기본값)

GOOGLE_API_KEY=your_api_key npm start

SSE 운송과 함께

GOOGLE_API_KEY=your_api_key TRANSPORT_TYPE=sse PORT=3000 npm start

도구 사용

서버는 MCP 클라이언트가 호출할 수 있는 세 가지 도구를 제공합니다.

이미지 인식

{
  "name": "image_recognition",
  "arguments": {
    "filepath": "/path/to/image.jpg",
    "prompt": "Describe this image in detail",
    "modelname": "gemini-2.0-flash"
  }
}

오디오 인식

{
  "name": "audio_recognition",
  "arguments": {
    "filepath": "/path/to/audio.mp3",
    "prompt": "Transcribe this audio",
    "modelname": "gemini-2.0-flash"
  }
}

비디오 인식

{
  "name": "video_recognition",
  "arguments": {
    "filepath": "/path/to/video.mp4",
    "prompt": "Describe what happens in this video",
    "modelname": "gemini-2.0-flash"
  }
}

도구 매개변수

모든 도구는 다음 매개변수를 허용합니다.

filepath (필수): 분석할 미디어 파일의 경로
prompt (선택 사항): 인식을 위한 사용자 정의 프롬프트(기본값은 "이 콘텐츠를 설명하세요")
modelname (선택 사항): 인식에 사용할 Gemini 모델(기본값은 "gemini-2.0-flash")

개발

개발 모드에서 실행

GOOGLE_API_KEY=your_api_key npm run dev

프로젝트 구조

src/index.ts : 진입점
src/server.ts : MCP 서버 구현
src/tools/ : 도구 구현
src/services/ : 서비스 구현(Gemini API)
src/types/ : 유형 정의
src/utils/ : 유틸리티 함수

특허

MIT

Install Server

HTTP connection URL

security – no known vulnerabilities

license - permissive license

quality - confirmed to work

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Tools

모델 컨텍스트 프로토콜을 통해 Google의 Gemini AI를 사용하여 이미지, 오디오, 비디오 인식을 위한 도구를 제공합니다.

Related MCP Servers

Image Toolkit MCP Server
Kira-Pgr
-
security
A
license
-
quality
A server that provides AI-powered image generation, modification, and processing capabilities through the Model Context Protocol, leveraging Google Gemini models and other image services.
Last updated -
11
Python
MIT License
Gemini MCP Image Generation Server
sanxfxteam
A
security
A
license
A
quality
A Model Context Protocol server that provides image generation capabilities using Google's Gemini 2 API, allowing users to generate multiple images with customizable parameters like prompts, aspect ratios, and person generation settings.
Last updated -
1
3
JavaScript
MIT License
MCP Gemini API Server
techkwon
A
security
F
license
A
quality
A server that provides access to Google Gemini AI capabilities including text generation, image analysis, YouTube video analysis, and web search functionality through the MCP protocol.
Last updated -
6
18
4
TypeScript
MCP Gemini CLI
kazuph
-
security
F
license
-
quality
A server that allows interaction with Google's Gemini AI through the Gemini CLI tool using the Model Context Protocol, providing a standardized interface for querying Gemini with various options and configurations.
Last updated -
JavaScript

View all related MCP servers

MCP Video Recognition Server