Skip to main content
Glama

MCP Image Recognition Server

MCP Image Recognition Server

An MCP server that provides image recognition capabilities using Anthropic and OpenAI vision APIs. Version 0.1.2.

Features

  • Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
  • Support for multiple image formats (JPEG, PNG, GIF, WebP)
  • Configurable primary and fallback providers
  • Base64 and file-based image input support
  • Optional text extraction using Tesseract OCR

Requirements

  • Python 3.8 or higher
  • Tesseract OCR (optional) - Required for text extraction feature
    • Windows: Download and install from UB-Mannheim/tesseract
    • Linux: sudo apt-get install tesseract-ocr
    • macOS: brew install tesseract

Installation

  1. Clone the repository:
git clone https://github.com/mario-andreschak/mcp-image-recognition.git cd mcp-image-recognition
  1. Create and configure your environment file:
cp .env.example .env # Edit .env with your API keys and preferences
  1. Build the project:
build.bat

Usage

Running the Server

Spawn the server using python:

python -m image_recognition_server.server

Start the server using batch instead:

run.bat server

Start the server in development mode with the MCP Inspector:

run.bat debug

Available Tools

  1. describe_image
    • Input: Base64-encoded image data and MIME type
    • Output: Detailed description of the image
  2. describe_image_from_file
    • Input: Path to an image file
    • Output: Detailed description of the image

Environment Configuration

  • ANTHROPIC_API_KEY: Your Anthropic API key.
  • OPENAI_API_KEY: Your OpenAI API key.
  • VISION_PROVIDER: Primary vision provider (anthropic or openai).
  • FALLBACK_PROVIDER: Optional fallback provider.
  • LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR).
  • ENABLE_OCR: Enable Tesseract OCR text extraction (true or false).
  • TESSERACT_CMD: Optional custom path to Tesseract executable.
  • OPENAI_MODEL: OpenAI Model (default: gpt-4o-mini). Can use OpenRouter format for other models (e.g., anthropic/claude-3.5-sonnet:beta).
  • OPENAI_BASE_URL: Optional custom base URL for the OpenAI API. Set to https://openrouter.ai/api/v1 for OpenRouter.
  • OPENAI_TIMEOUT: Optional custom timeout (in seconds) for the OpenAI API.

Using OpenRouter

OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:

  1. Obtain an OpenAI API key from OpenRouter.
  2. Set OPENAI_API_KEY in your .env file to your OpenRouter API key.
  3. Set OPENAI_BASE_URL to https://openrouter.ai/api/v1.
  4. Set OPENAI_MODEL to the desired model using the OpenRouter format (e.g., anthropic/claude-3.5-sonnet:beta).
  5. Set VISION_PROVIDER to openai.

Default Models

  • Anthropic: claude-3.5-sonnet-beta
  • OpenAI: gpt-4o-mini
  • OpenRouter: Use the anthropic/claude-3.5-sonnet:beta format in OPENAI_MODEL.

Development

Running Tests

Run all tests:

run.bat test

Run specific test suite:

run.bat test server run.bat test anthropic run.bat test openai

Docker Support

Build the Docker image:

docker build -t mcp-image-recognition .

Run the container:

docker run -it --env-file .env mcp-image-recognition

License

MIT License - see LICENSE file for details.

Release History

  • 0.1.2 (2025-02-20): Improved OCR error handling and added comprehensive test coverage for OCR functionality
  • 0.1.1 (2025-02-19): Added Tesseract OCR support for text extraction from images (optional feature)
  • 0.1.0 (2025-02-19): Initial release with Anthropic and OpenAI vision support
Deploy Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Anthropic Claude Vision 및 OpenAI GPT-4 Vision API를 사용하여 이미지 인식 기능을 제공하고, 여러 이미지 형식을 지원하며, Tesseract OCR을 통해 선택적 텍스트 추출 기능을 제공합니다.

  1. 저자
    1. 특징
      1. 요구 사항
        1. 설치
          1. 옵션 1: uvx 사용(Claude Desktop 및 Cursor에 권장)
          2. 옵션 2: Docker 사용
          3. 옵션 3: 소스에서
        2. 완성
          1. Claude 데스크톱 통합
          2. 커서 통합
          3. Docker 통합
        3. 용법
          1. 서버를 직접 실행
          2. 사용 가능한 도구
          3. 환경 구성
          4. OpenRouter 사용하기
          5. 기본 모델
        4. 개발
          1. 개발 설정 가이드
          2. 테스트 실행
          3. 도커 지원
        5. 특허
          1. Cloudflare Workers AI 사용
        6. AI 어시스턴트와 함께 사용
          1. 출시 내역
            1. 특허
              1. 기여하다
                1. 새로운 버전 출시

              Related MCP Servers

              • A
                security
                A
                license
                A
                quality
                A server that accepts image URLs and analyzes their content using GPT-4-turbo, enabling Claude AI assistants to understand and describe images through natural language.
                Last updated -
                1
                1
                6
                MIT License
              • A
                security
                A
                license
                A
                quality
                This is a server implementation for performing Optical Character Recognition (OCR) using the Google Cloud Vision API. It is built on top of the FastMCP framework, which allows for the creation of modular and extensible command processing tools.
                Last updated -
                1
                1
                MIT License
                • Apple
              • A
                security
                A
                license
                A
                quality
                A server that enables OCR capabilities to recognize text from images, PDFs, and Word documents, convert them to Markdown, and extract key information.
                Last updated -
                3
                110
                23
                MIT License
              • -
                security
                F
                license
                -
                quality
                Provides AI-powered visual analysis capabilities for Claude and other MCP-compatible AI assistants, allowing them to capture and analyze screenshots, perform file operations, and generate UI/UX reports.
                Last updated -
                1
                ISC License

              View all related MCP servers

              MCP directory API

              We provide all the information about MCP servers via our MCP API.

              curl -X GET 'https://glama.ai/api/mcp/v1/servers/mario-andreschak/mcp-image-recognition'

              If you have feedback or need assistance with the MCP directory API, please join our Discord server