MCP ScreenCatch

STANDALONE-IMPLEMENTATION.md•10.4 KiB

# Standalone ScreenCatch Implementation ## Overview The standalone version of ScreenCatch is a command-line application that can be invoked independently of Claude Desktop, making it suitable for web application integration and automation. ## Architecture ``` screencatch.py (CLI entry point) | |-- enhanced_capture_standalone.py | |-- DescriptionDialog (tkinter) | |-- MultiCaptureOverlay (tkinter) | |-- merge_and_save() | |-- PIL ImageGrab (screen capture) | |-- image_merger.py (merge images) | |-- preview_and_confirm_standalone.py | |-- PreviewWindow (tkinter) | |-- image_merger.py |-- ImageMerger.merge_vertical() |-- ImageMerger.merge_horizontal() |-- ImageMerger.merge_grid() |-- ImageMerger.merge_auto() ``` ## Key Differences from MCP Version ### MCP Version - Integrated with Claude Desktop via Model Context Protocol - Uses Node.js screenshot-desktop library - Uses Sharp for image processing - Python scripts called from TypeScript - Result returned to Claude as JSON ### Standalone Version - Pure Python implementation - Direct PIL/Pillow ImageGrab for screenshots - Self-contained, no Node.js dependencies - Can be invoked from command line or subprocess - JSON output optional via `--json-output` flag ## Files ### Core Application Files 1. **screencatch.py** (192 lines) - CLI entry point - Argument parsing - Main capture loop with recapture support - JSON output formatting 2. **enhanced_capture_standalone.py** (300+ lines) - `DescriptionDialog` - tkinter description input - `MultiCaptureOverlay` - multi-region capture with draggable instructions - `merge_and_save()` - capture regions and merge images 3. **preview_and_confirm_standalone.py** (146 lines) - `PreviewWindow` - show merged result - Recapture option with same description 4. **image_merger.py** (renamed from image-merger.py) - `ImageMerger` class with merge methods - Vertical, horizontal, grid, and auto layouts ### Documentation Files 1. **STANDALONE-README.md** - Complete usage documentation - Command-line options - Integration examples - Troubleshooting 2. **STANDALONE-QUICKSTART.md** - Quick start guide - Common use cases - Keyboard shortcuts - Web integration examples 3. **STANDALONE-IMPLEMENTATION.md** (this file) - Architecture overview - Technical details - Comparison with MCP version ### Test Files 1. **test-standalone.py** - Import tests - Image merger tests - Description dialog test (interactive) ## Technical Details ### Multi-Monitor Support Uses Windows API via ctypes: ```python user32 = ctypes.windll.user32 virtual_x_min = user32.GetSystemMetrics(76) # SM_XVIRTUALSCREEN virtual_y_min = user32.GetSystemMetrics(77) # SM_YVIRTUALSCREEN virtual_width = user32.GetSystemMetrics(78) # SM_CXVIRTUALSCREEN virtual_height = user32.GetSystemMetrics(79) # SM_CYVIRTUALSCREEN ``` Overlay window geometry: ```python self.root.geometry(f"{virtual_width}x{virtual_height}+{virtual_x_min}+{virtual_y_min}") ``` ### Screen Capture Uses PIL ImageGrab for direct screen capture: ```python from PIL import ImageGrab bbox = (region['x'], region['y'], region['x'] + region['width'], region['y'] + region['height']) img = ImageGrab.grab(bbox=bbox) ``` No intermediate screenshot files needed - captures directly to memory. ### Event Handling Draggable instruction box using tag bindings: ```python # Create elements with 'instructions' tag self.canvas.create_rectangle(..., tags='instructions') self.canvas.create_text(..., tags='instructions') # Bind to tag (higher priority than canvas bindings) self.canvas.tag_bind('instructions', '<ButtonPress-1>', self.on_instructions_press) self.canvas.tag_bind('instructions', '<B1-Motion>', self.on_instructions_drag) self.canvas.tag_bind('instructions', '<ButtonRelease-1>', self.on_instructions_release) # Return "break" to stop propagation def on_instructions_press(self, event): self.is_dragging_instructions = True return "break" # Don't propagate to canvas handlers ``` ### Coordinate System Canvas coordinates are already in virtual screen space: - Primary monitor at (0, 0) - Secondary monitor left: negative x coordinates - Secondary monitor right: positive x beyond primary width - No offset addition needed when saving regions ### Image Merging Smart auto-merge logic: ```python def merge_auto(images, spacing=10): if len(images) == 2: # Check aspect ratio avg_aspect = sum(img.width / img.height for img in images) / 2 if avg_aspect > 1.5: return merge_vertical(images, spacing) # Wide images else: return merge_horizontal(images, spacing) # Tall images elif len(images) <= 4: return merge_grid(images, cols=2, spacing=spacing) # 2x2 grid else: return merge_grid(images, spacing=spacing) # Auto grid ``` ## Command-Line Interface ### Arguments ```python parser.add_argument('--output-dir', '-o', default=os.getcwd()) parser.add_argument('--no-description', action='store_true') parser.add_argument('--preview', action='store_true') parser.add_argument('--merge-method', choices=['auto', 'vertical', 'horizontal', 'grid'], default='auto') parser.add_argument('--spacing', type=int, default=10) parser.add_argument('--json-output', action='store_true') ``` ### Exit Codes - `0` - Success - `1` - Cancelled by user or no regions captured - `130` - Interrupted with Ctrl+C - Other - Error occurred ### JSON Output Format ```json { "success": true, "filepath": "capture_2024-01-15_143022.png", "description": "Setting up VS Code", "capture_count": 3, "merged": true, "recapture_iterations": 0, "metadata_file": "capture_2024-01-15_143022.json" } ``` ## Web Application Integration ### Subprocess Call ```python import subprocess import json result = subprocess.run( ['python', 'screencatch.py', '--no-description', '--json-output'], capture_output=True, text=True ) if result.returncode == 0: data = json.loads(result.stdout) filepath = data['filepath'] # Process the captured image ``` ### Environment Variables ```python os.environ['SCREENCATCH_OUTPUT_DIR'] = '/path/to/output' ``` Though command-line `--output-dir` is preferred. ## File Naming ### Image Files `capture_YYYY-MM-DD_HHMMSS.png` Example: `capture_2024-01-15_143022.png` ### Metadata Files `capture_YYYY-MM-DD_HHMMSS.json` Example: `capture_2024-01-15_143022.json` ### Temporary Files During capture, creates temporary IPC file: `temp-capture-{timestamp}.json` Deleted immediately after reading. ## Dependencies ### Python Standard Library - `tkinter` - GUI (description dialog, overlay, preview) - `json` - Data serialization - `argparse` - CLI argument parsing - `pathlib` - File path handling - `datetime` - Timestamps - `ctypes` - Windows API access ### External Libraries - `Pillow` (PIL) - Image capture and manipulation ```bash pip install pillow ``` ## Platform Support ### Windows Full support including: - Multi-monitor detection - Virtual screen coordinates - Negative coordinate handling ### macOS / Linux Basic support: - Single monitor works - Multi-monitor may have limitations - Virtual screen API not available ## Performance ### Typical Workflow (3 captures, auto-merge) 1. Description dialog: ~2s (user input) 2. Capture overlay: ~1s per region 3. Screen capture: <100ms per region 4. Image merge: ~200ms 5. File save: ~100ms **Total: ~5-7 seconds** (mostly user interaction time) ### Memory Usage - Each 1920x1080 capture: ~8MB in memory - Merged images released after save - Peak memory for 5 full-screen captures: ~50MB ### Disk Usage - PNG images: typically 100KB - 2MB depending on content - JSON metadata: ~500 bytes - 2KB ## Testing ### Run Test Suite ```bash python test-standalone.py ``` Tests: - Import verification - Image merger functionality - Description dialog (optional interactive test) ### Manual Testing ```bash # Basic capture python screencatch.py # Without description python screencatch.py --no-description # With preview python screencatch.py --preview # JSON output python screencatch.py --json-output # Custom output directory python screencatch.py --output-dir ./screenshots ``` ## Known Issues ### Windows Console Encoding Fixed by setting UTF-8 encoding: ```python if sys.platform == 'win32': sys.stdout.reconfigure(encoding='utf-8') sys.stderr.reconfigure(encoding='utf-8') ``` ### Module Import (Hyphens) Python can't import modules with hyphens in names. **Solution**: Renamed files: - `image-merger.py` → `image_merger.py` - `merge-images-cli.py` → `merge_images_cli.py` ### Preview Blocking Preview window blocks execution until user responds. This is by design for standalone use, but was problematic for MCP integration. **Solution**: Made preview optional (default: false) ## Future Enhancements ### Planned - [ ] Cross-platform multi-monitor support - [ ] OCR text extraction from captures - [ ] Annotation tools (arrows, highlights, text) - [ ] Video recording mode - [ ] Cloud upload integration - [ ] Custom merge templates ### Under Consideration - [ ] GUI mode (not just overlay) - [ ] Hotkey support for quick capture - [ ] Tray icon for background operation - [ ] Plugin system for custom processors - [ ] Database storage option ## Migration from MCP Version If you were using the MCP version: ### What Stays the Same - Multi-region capture workflow - Description input - Auto-merge functionality - Metadata JSON format - Multi-monitor support ### What Changes - Invocation: command-line instead of Claude Desktop tool - Output: files on disk instead of returned to Claude - Integration: subprocess calls instead of MCP protocol ### Example Migration **Before (MCP):** ```typescript // In Claude Desktop "Please capture screenshots of the installation process" // Claude calls enhanced_capture tool // Result returned to conversation ``` **After (Standalone):** ```python # In your web application result = subprocess.run( ['python', 'screencatch.py', '--json-output'], capture_output=True, text=True ) data = json.loads(result.stdout) filepath = data['filepath'] # Upload filepath to server, save to database, etc. ``` ## Support For issues or questions: 1. Check [STANDALONE-README.md](STANDALONE-README.md) for documentation 2. Run `python screencatch.py --help` for options 3. Run `python test-standalone.py` to verify installation 4. Check logs in stderr for debugging information ## License See main project LICENSE file.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/johnelamont/mcp-screencatch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

STANDALONE-IMPLEMENTATION.md•10.4 KiB