analyze_media
Upload local media files for server-side extraction and analysis, handling large PDFs, videos, audio, and complex images that exceed client-side capabilities.
Instructions
Upload and analyze a local file via EnriProxy (server-side extraction + model analysis).
When to use:
Large PDFs (many pages) or scanned PDFs where client-side Read may truncate or miss content.
Video/audio or other binary media your client cannot Read.
Audio files in common formats (mp3, wav, flac, m4a, aac, ogg/oga, opus, wma, weba, mka, aiff/aif/aifc, caf, m4b/m4r, mp1/mp2/mpa/mpga).
HEIC/AVIF/TIFF/APNG/SVG/Office docs when your client Read is unreliable.
Very large files where resumable uploads are required (up to 4GB).
Large PDFs/videos: set
analysis_modeto 'multipass' for better coverage (auto prefers multipass for PDFs > 20 pages).For time-specific video questions (e.g., "what happens at 12:34?"), set
video.clip_start_secondsandvideo.clip_duration_seconds.
Rules:
Use
pathfor one file, orpathsfor multiple images (UI screenshots/photo sets).path/pathsare absolute paths on the machine running this MCP server (the client).Requires a valid EnriProxy API key (env
ENRIPROXY_API_KEY, sent as Authorization: Bearer ...).Prefer the client's native Read tool only for small/simple text/PDF/common images when it works; prefer this tool for large PDFs.
Answer strictly from the tool output; if frames/transcript are missing, say so.
Video: frames + transcript belong to the SAME video timeline (not unrelated images).
Animated GIF/WebP/APNG/SVG inputs are converted into representative key frames.
Set
language(e.g., 'es') to match the user request and avoid language drift.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | No | Absolute path to a local file on the machine running the MCP server (e.g., C:\\Users\\User\\Downloads\\video.mp4). | |
| paths | No | Absolute paths to multiple local image files (UI screenshots/photo sets). When provided, EnriVision uploads a single media-set archive for server-side batching + reduce. | |
| context | No | Optional analysis hint: ui, diagram, chart, error, code, meeting, tutorial, photo. Leave empty for auto-detection. | |
| question | No | Optional explicit question to answer about the file. | |
| language | No | Preferred response language code (ISO 639-1), e.g. 'es', 'en'. | |
| max_frames | No | Optional max frames for videos (1-20) in single-pass mode. For targeted timestamps, prefer video.clip_start_seconds + video.clip_duration_seconds. For multipass, use video.max_frames_per_segment. | |
| transcribe | No | Optional override to enable/disable audio transcription for videos. | |
| transcription_language | No | Optional Whisper language hint for audio/video transcription (e.g., 'auto', 'es', 'en'). | |
| analysis_mode | No | Optional analysis mode selector: auto, single, or multipass. | |
| video | No | Optional video multipass tuning. Used only when analyzing videos. | |
| document | No | Optional document multipass tuning (PDF). | |
| audio | No | Optional audio multipass tuning (used only when analyzing audio files). | |
| images | No | Optional image-set multipass tuning (used only with `paths`). |