AI Vision MCP Server

AGENTS.md•14 kB

# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. Please always use context7 MCP, web search, or web fetch for additional information when fixing bugs or implementing new features. ## **CRITICAL: Documentation Maintenance Requirements** **BEFORE starting any coding work:** 1. **ALWAYS create a plan document** in the `docs/llm_logs/` folder before writing any code 2. **ALWAYS update README.md** when introducing changes that affect: - New MCP tools or parameters - Environment variables - Configuration options - Installation instructions - Breaking changes 3. **ALWAYS update docs/SPEC.md** when introducing changes that affect: - Architecture modifications - New provider implementations - API interface changes - File handling logic - Error handling patterns **Planning Process:** - Create plan documents in `docs/llm_logs/` folder (e.g., `docs/llm_logs/feature-name-plan.md`) - Include architecture decisions, implementation steps, and testing strategy - Reference this plan in your commit messages - Keep plan documents as documentation of implementation decisions **Solution Planning Best Practices:** - **ALWAYS present at least 3 options** when planning solutions to problems - Analyze trade-offs: effort vs. benefit, maintainability vs. speed, risk vs. reward - Provide clear recommendations with rationale (e.g., "Option 2 recommended because...") - Consider: quick fixes, balanced approaches, and comprehensive solutions - Include effort estimates, risk assessments, and rollback strategies for each option - Use structured format: Option 1 (Simple), Option 2 (Balanced), Option 3 (Comprehensive) **Example Planning Structure:** ``` ## Plan: [Problem Description] ### Option 1: Quick Fix (15 min) - ✅ Minimal change, fastest implementation - ❌ Technical debt, not future-proof - **When to use**: Urgent hotfixes, time pressure ### Option 2: Balanced Solution (45 min) - RECOMMENDED - ✅ Good maintainability, moderate effort - ✅ Addresses root cause, extensible - ❌ Longer implementation time - **When to use**: Most production scenarios ### Option 3: Comprehensive Refactor (2 hours) - ✅ Perfect architecture, future-proof - ❌ High effort, potential for new bugs - **When to use**: Major feature additions, architectural improvements ### Recommendation: Option 2 **Rationale**: Balances immediate needs with long-term maintainability... ``` **Documentation Synchronization:** - README.md is for **users** - installation, usage, and configuration - docs/SPEC.md is for **developers** - technical specifications and architecture - CLAUDE.md is for **AI assistants** - development patterns and constraints - All three documents must stay consistent with the actual implementation ## Development Commands ### Building and Testing - `npm run build` - Build TypeScript project to `dist/` directory - `npm run dev` - Start development server with watch mode (tsc --watch) - `npm start` - Start the built MCP server (node dist/index.js) ### Code Quality - `npm run lint` - Run ESLint on all TypeScript files - `npm run lint:fix` - Run ESLint with auto-fix - `npm run format` - Format code with Prettier ### Publishing - `npm run prepublishOnly` - Run lint before publish - `npm run preversion` - Run lint before version bump - `npm run version` - Format code and add to git before version - `npm run prepare` - Build project automatically on install ## Architecture Overview This is a Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models. ### Core Components **Server Architecture** (`src/server.ts`): - Main MCP server entry point using `@modelcontextprotocol/sdk` - Lazy-loaded services initialized on first request via `getServices()` function - Four primary tools: `analyze_image`, `compare_images`, `detect_objects_in_image`, and `analyze_video` - Comprehensive error handling with custom `VisionError` types - Graceful shutdown handling for SIGINT/SIGTERM **Configuration Hierarchy System**: The server implements a sophisticated 4-level configuration priority system: 1. **LLM-assigned values** - Parameters passed directly in tool calls (e.g., `{"temperature": 0.1}`) 2. **Function-specific variables** - `TEMPERATURE_FOR_ANALYZE_IMAGE`, `MAX_TOKENS_FOR_COMPARE_IMAGES`, etc. 3. **Task-specific variables** - `TEMPERATURE_FOR_IMAGE`, `MAX_TOKENS_FOR_VIDEO`, etc. 4. **Universal variables** - `TEMPERATURE`, `MAX_TOKENS`, etc. **Provider Factory** (`src/providers/factory/ProviderFactory.ts`): - Factory pattern for creating AI provider instances with validation (`VisionProviderFactory`) - Supports two providers: `google` (Gemini API) and `vertex_ai` (Vertex AI) - `createProviderWithValidation()` method ensures configuration validation before provider creation - Automatic provider initialization via `initializeDefaultProviders()` on module load - Configuration requirement validation and error handling with provider context - Dynamic provider registration support through `registerProvider()` method **Configuration Service** (`src/services/ConfigService.ts`): - Singleton pattern for configuration management via `ConfigService.getInstance()` - Environment variable validation with Zod schemas - Provider-specific configuration methods - Auto-derivation of related settings (e.g., project ID from credentials) - Hierarchical configuration resolution **Configuration Validation** (`src/types/Config.ts` and `src/utils/validation.ts`): - `Config.ts` defines TypeScript interfaces for all configuration options - `validation.ts` provides Zod schemas that validate environment variables against these interfaces - These files must stay synchronized - any new config field in Config.ts requires corresponding validation rules in validation.ts **Key Services**: - `FileService` - Handles file uploads, validation, and processing with support for URLs, local files, and base64, includes cross-platform path handling - `ConfigService` - Singleton pattern for environment variables and settings with validation - Vision providers in `src/providers/` - AI model implementations with consistent interfaces - Storage strategies in `src/storage/` - Google Cloud Storage integration - File upload strategies in `src/file-upload/` - Provider-specific upload handling - Image annotation utilities in `src/utils/` - Sharp-based image processing for object detection ### MCP Tools Implementation **All tools follow consistent patterns:** - Configuration hierarchy: function-specific → task-specific → universal variables - File source support: URLs, local files, base64 data - Error handling with custom `VisionError` types with provider context - Provider-agnostic interface through factory pattern - Structured output schemas for object detection **Tool-specific behaviors:** - `detect_objects_in_image`: Returns annotated images with bounding boxes, 2-step file handling (explicit path → temp file), uses structured JSON output with coordinates, includes CSS selector suggestions for web elements - `compare_images`: Supports 2-4 images with mixed source types, batch processing optimization - `analyze_image`: Special prompt handling for frontend UI comparison tasks, intelligent file processing based on size - `analyze_video`: YouTube URL and local file support, GCS integration for Vertex AI, duration and size validation ### Provider Implementation **Gemini Provider** (`src/providers/gemini/`): - Direct Google Gemini API integration using `@google/genai` - Files API for larger uploads (>10MB via `GEMINI_FILES_API_THRESHOLD`) - Base64 encoding for smaller files (inline data) - Structured output support for object detection with response schemas - Native support for both `google` and `vertex_ai` providers using same SDK **Vertex AI Provider** (`src/providers/vertexai/`): - Google Cloud Vertex AI integration using `@google/genai` SDK - Requires GCS bucket for all file uploads (configured via `VERTEX_AI_FILES_API_THRESHOLD`) - Service account authentication with auto project ID extraction - Uses same underlying SDK as Gemini provider for consistency ### File Processing Flow 1. **Input Validation**: File size, format, and duration checks using configurable limits 2. **Upload Strategy Selection**: Based on provider and file size thresholds 3. **File Processing**: MIME type detection, path resolution, cross-platform support (Windows/Unix) 4. **AI Analysis**: Provider-specific API calls with structured output schemas 5. **Response Processing**: Structured JSON responses with comprehensive error handling ## Critical Development Constraints ### Configuration Synchronization - `src/types/Config.ts` and `src/utils/validation.ts` MUST stay synchronized - Every new config field in Config.ts requires corresponding Zod validation in validation.ts - Function-specific environment variables must follow the naming pattern: `TEMPERATURE_FOR_ANALYZE_IMAGE`, etc. - When adding new configuration, always implement the 4-level hierarchy ### Error Handling Requirements - Always use custom `VisionError` types with provider context - Include error codes for proper client handling - Implement retry logic for network failures - Never expose sensitive credentials in error messages - Provider-specific error context for debugging ### TypeScript Configuration - ES2022 target with ESNext modules, strict type checking enabled - Path mapping with `@/*` pointing to `src/*` for clean imports - Declaration maps and source maps enabled for debugging - No implicit any, returns, or this allowed (strict mode) ### File Organization ``` src/ ├── providers/ # AI provider implementations │ ├── gemini/ # Google Gemini provider │ ├── vertexai/ # Vertex AI provider │ └── factory/ # Provider factory ├── services/ # Core services │ ├── ConfigService.ts │ └── FileService.ts ├── storage/ # Storage implementations ├── file-upload/ # File upload strategies ├── types/ # TypeScript type definitions ├── utils/ # Utility functions └── tools/ # MCP tool implementations ``` ## Development Patterns 1. **Lazy Loading**: Services initialized on first request via `getServices()` function 2. **Factory Pattern**: Providers created through `VisionProviderFactory` with validation 3. **Singleton Pattern**: `ConfigService.getInstance()` ensures consistency 4. **Strategy Pattern**: File upload strategies selected based on provider and size 5. **Zod Validation**: All inputs validated with Zod schemas for runtime type safety 6. **Configuration Hierarchy**: Always implement 4-level priority: LLM-assigned → function-specific → task-specific → universal 7. **Error Context**: Always include provider information in errors for debugging 8. **Cross-Platform Support**: Handle both Windows and Unix file paths correctly 9. **Config Building Pattern**: Use `buildConfigWithOptions()` helper from BaseVisionProvider for consistent config generation ### Config Building Pattern (IMPORTANT) When implementing provider methods that need AI configuration, **always use** the `buildConfigWithOptions()` helper: ```typescript // ✅ Correct - uses helper method const config = this.buildConfigWithOptions('image', options?.functionName, options); await this.client.models.generateContent({ model, contents, config, // Automatically includes responseSchema and systemInstruction if provided }); // ❌ Incorrect - manual config building (duplicates code) const config = { temperature: this.resolveTemperatureForFunction(...), topP: this.resolveTopPForFunction(...), topK: this.resolveTopKForFunction(...), maxOutputTokens: this.resolveMaxTokensForFunction(...), candidateCount: 1, }; if (options?.responseSchema) { config.responseMimeType = 'application/json'; config.responseSchema = options.responseSchema; } if (options?.systemInstruction) { config.systemInstruction = options.systemInstruction; } // ... manual config building creates maintenance burden ``` **Why use `buildConfigWithOptions()`?** 1. **DRY Principle**: Single source of truth for config generation 2. **Automatic Structured Output**: Handles `responseSchema` and `systemInstruction` automatically 3. **Consistency**: Same config format across all providers (Gemini, Vertex AI) 4. **Maintainability**: Adding new config options only requires updating one method 5. **Type Safety**: Centralized TypeScript type checking **This pattern is critical for:** - Object detection (`detect_objects_in_image`) - requires structured JSON output - Any future tools that need custom response schemas - Maintaining consistency between Gemini and Vertex AI providers **Reference Implementation:** - Helper method: `src/providers/base/VisionProvider.ts:354-395` - Usage in Gemini: `src/providers/gemini/GeminiProvider.ts:185-189, 348-352, 468-472` - Usage in Vertex AI: `src/providers/vertexai/VertexAIProvider.ts:84-88, 161-165, 246-250` ## Environment Variables **Required for Development:** - `IMAGE_PROVIDER` and `VIDEO_PROVIDER`: Set to `google` or `vertex_ai` - Provider-specific credentials (GEMINI_API_KEY or VERTEX_CREDENTIALS + GCS_BUCKET_NAME) **Common Development Overrides:** - `TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0` for deterministic object detection - `LOG_LEVEL=debug` for verbose logging during development - `NODE_ENV=development` for development-specific behavior ## Testing and Debugging - Use `npm run dev` for development with automatic rebuilding - Check console logs for detailed file processing information - Verify configuration hierarchy by setting different levels of environment variables - Test with multiple file sources (URLs, local files, base64) to ensure compatibility - Use structured logging patterns for consistent debugging output

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/honeyvig/ai-vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

AGENTS.md•14 kB