mcp-オープンビジョン

MCP オープンビジョン

概要

MCP OpenVisionは、OpenRouterビジョンモデルを活用した画像解析機能を提供するモデルコンテキストプロトコル（MCP）サーバーです。MCPエコシステム内のシンプルなインターフェースを介して、AIアシスタントが画像を解析できるようになります。

インストール

Smithery経由でインストール

Smithery経由で Claude Desktop 用の mcp-openvision を自動的にインストールするには:

npx -y @smithery/cli install @Nazruden/mcp-openvision --client claude

pipの使用

pip install mcp-openvision

UVの使用（推奨）

uv pip install mcp-openvision

構成

MCP OpenVision には OpenRouter API キーが必要であり、環境変数を通じて設定できます。

OPENROUTER_API_KEY (必須): OpenRouter APIキー
OPENROUTER_DEFAULT_MODEL (オプション): 使用するビジョンモデル

OpenRouter ビジョンモデル

MCP OpenVisionは、ビジョン機能をサポートするあらゆるOpenRouterモデルで動作します。デフォルトのモデルはqwen/qwen2.5-vl-32b-instruct:freeですが、互換性のある他のモデルを指定することもできます。

OpenRouter で利用できる一般的なビジョンモデルには次のようなものがあります。

qwen/qwen2.5-vl-32b-instruct:free (デフォルト)
anthropic/claude-3-5-sonnet
anthropic/claude-3-opus
anthropic/claude-3-sonnet
openai/gpt-4o

OPENROUTER_DEFAULT_MODEL環境変数を設定するか、 modelパラメータをimage_analysis関数に直接渡すことで、カスタムモデルを指定できます。

使用法

MCP Inspectorによるテスト

MCP OpenVision をテストする最も簡単な方法は、MCP Inspector ツールを使用することです。

npx @modelcontextprotocol/inspector uvx mcp-openvision

Claude DesktopまたはCursorとの統合

MCP 構成ファイルを編集します。
- Windows: %USERPROFILE%\.cursor\mcp.json
- macOS: ~/.cursor/mcp.jsonまたは~/Library/Application Support/Claude/claude_desktop_config.json
次の構成を追加します。

{
  "mcpServers": {
    "openvision": {
      "command": "uvx",
      "args": ["mcp-openvision"],
      "env": {
        "OPENROUTER_API_KEY": "your_openrouter_api_key_here",
        "OPENROUTER_DEFAULT_MODEL": "anthropic/claude-3-sonnet"
      }
    }
  }
}

開発のためにローカルで実行する

# Set the required API key
export OPENROUTER_API_KEY="your_api_key"

# Run the server module directly
python -m mcp_openvision

特徴

MCP OpenVision は次のコアツールを提供します。

image_analysis : さまざまなパラメータをサポートするビジョンモデルを使用して画像を分析します。
- image : 次のように提供できます:
  - Base64エンコードされた画像データ
  - 画像URL（http/https）
  - ローカルファイルパス
- query : 画像解析タスクのユーザー指示
- system_prompt : モデルの役割と動作を定義する指示（オプション）
- model : 使用するビジョンモデル
- temperature : ランダム性を制御します (0.0-1.0)
- max_tokens : 最大レスポンス長

効果的なクエリの作成

queryパラメータは、画像分析から有用な結果を得るために不可欠です。適切に作成されたクエリは、以下のコンテキストを提供します。

目的: この画像を分析する理由
焦点領域: 注目すべき特定の要素または詳細
必要な情報: 抽出する必要がある情報の種類
フォーマット設定: 結果をどのように構造化するか

効果的なクエリの例

基本クエリ	拡張クエリ
「この画像を説明してください」	「この店舗の棚の画像に表示されているすべての小売製品を識別し、その価格帯を推定してください」
「この画像には何があるの？」	「この医療スキャンを分析して異常がないか調べ、強調表示された領域に焦点を当て、考えられる診断を提供します。」
「このチャートを分析してください」	「四半期ごとの売上を示すこの棒グラフから数値データを抽出し、2022年から2023年の主要な傾向を特定します。」
「テキストを読む」	「このレストランのメニューに表示されているすべてのテキストを、品名、説明、価格を残して書き写してください」

分析が必要な理由や、求めている具体的な情報についてのコンテキストを提供することで、モデルが関連する詳細に焦点を合わせ、より価値のある洞察を生み出すのに役立ちます。

使用例

# Analyze an image from a URL
result = await image_analysis(
    image="https://example.com/image.jpg",
    query="Describe this image in detail"
)

# Analyze an image from a local file with a focused query
result = await image_analysis(
    image="path/to/local/image.jpg",
    query="Identify all traffic signs in this street scene and explain their meanings for a driver education course"
)

# Analyze with a base64-encoded image and a specific analytical purpose
result = await image_analysis(
    image="SGVsbG8gV29ybGQ=...",  # base64 data
    query="Examine this product packaging design and highlight elements that could be improved for better visibility and brand recognition"
)

# Customize the system prompt for specialized analysis
result = await image_analysis(
    image="path/to/local/image.jpg",
    query="Analyze the composition and artistic techniques used in this painting, focusing on how they create emotional impact",
    system_prompt="You are an expert art historian with deep knowledge of painting techniques and art movements. Focus on formal analysis of composition, color, brushwork, and stylistic elements."
)

画像入力タイプ

image_analysisツールは、いくつかの種類の画像入力を受け入れます。

Base64エンコードされた文字列
画像の URL - http:// または https:// で始まる必要があります
ファイルパス:
- 絶対パス: / (Unix) またはドライブ文字 (Windows) で始まる完全なパス
- 相対パス: 現在の作業ディレクトリからの相対パス
- project_root を使用した相対パス: project_rootパラメータを使用してベースディレクトリを指定します。

相対パスの使用

相対ファイルパス (「examples/image.jpg」など) を使用する場合は、次の 2 つのオプションがあります。

パスは、サーバーが動作している現在の作業ディレクトリからの相対パスでなければなりません。
または、 project_rootパラメータを指定することもできます。

# Example with relative path and project_root
result = await image_analysis(
    image="examples/image.jpg",
    project_root="/path/to/your/project",
    query="What is in this image?"
)

これは、現在の作業ディレクトリが予測できないアプリケーションや、特定のディレクトリに対する相対パスを使用してファイルを参照する場合に特に便利です。

発達

開発環境のセットアップ

# Clone the repository
git clone https://github.com/modelcontextprotocol/mcp-openvision.git
cd mcp-openvision

# Install development dependencies
pip install -e ".[dev]"

コードのフォーマット

このプロジェクトでは、Blackを使って自動コードフォーマットを行っています。フォーマットはGitHub Actionsを通じて強制されます。

リポジトリにプッシュされたすべてのコードは自動的に黒でフォーマットされます
リポジトリの協力者からのプルリクエストの場合、ブラックはコードをフォーマットし、PRブランチに直接コミットします。
フォークからのプルリクエストの場合、ブラックは元のPRにマージできるフォーマットされたコードを含む新しいPRを作成します。

コミットする前に、Black をローカルで実行してコードをフォーマットすることもできます。

# Format all Python code in the src and tests directories
black src tests

テストを実行する

pytest

リリースプロセス

このプロジェクトでは、自動化されたリリースプロセスを使用します。

セマンティックバージョニングの原則に従ってpyproject.tomlのバージョンを更新します。
- ヘルパースクリプトを使用できます: python scripts/bump_version.py [major|minor|patch]
CHANGELOG.md新しいバージョンの詳細で更新します。
- このスクリプトはCHANGELOG.mdにテンプレートエントリを作成し、それを入力することができます。
これらの変更をコミットしてmainブランチにプッシュします
GitHub Actions ワークフローは次のようになります。
- バージョンの変更を検出する
- 新しいGitHubリリースを自動的に作成する
- PyPIに公開する公開ワークフローをトリガーする

この自動化により、一貫したリリースプロセスが維持され、すべてのリリースが適切にバージョン管理され、文書化されることが保証されます。

サポート

このプロジェクトが役に立つと思われる場合は、進行中の開発とメンテナンスをサポートするために私にコーヒーを買っていただけると幸いです。

ライセンス

このプロジェクトは MIT ライセンスに基づいてライセンスされています - 詳細についてはLICENSEファイルを参照してください。

Install Server

HTTP connection URL

security – no known vulnerabilities

license - permissive license

quality - confirmed to work

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Tools

image_analysis

Related Resources

Reddit Discussion about this server

Related MCP Servers

MCP OpenAI Server
mzxrai
-
security
A
license
-
quality
A Model Context Protocol (MCP) server that lets you seamlessly use OpenAI's models right from Claude.
Last updated -
1
762
55
JavaScript
MIT License
Eyevinn Open Source Cloud MCP Server
EyevinnOSC
A
security
A
license
A
quality
MCP Server for Eyevinn Open Source Cloud API, enabling creation of solutions based on open web services. Web services based on open source where the creator gets a share of the revenue the platform generates.
Last updated -
5
409
6
TypeScript
MIT License
MCP Read Images
catalystneuro
-
security
A
license
-
quality
An MCP server for analyzing images using OpenRouter vision models, offering capabilities like automatic image resizing, model configuration, and handling custom queries about images.
Last updated -
8
JavaScript
MIT License
Youtube Vision MCP
minbang930
A
security
A
license
A
quality
MCP (Model Context Protocol) server that utilizes the Google Gemini Vision API to interact with YouTube videos. It allows users to get descriptions, summaries, answers to questions, and extract key moments from YouTube videos.
Last updated -
4
12
5
JavaScript
MIT License

View all related MCP servers

mcp-openvision