Skip to main content
Glama

Terminal Control MCP

by taskhub-sh
Apache 2.0
1
  • Linux
TASKS.md5.36 kB
# Terminal Control MCP - Implementation Tasks ## Overview This file contains tasks for implementing the minimal viable product (MVP) of the Terminal Control MCP server using the xterm + Xvfb approach. Each task is designed to provide sufficient context for an LLM coding agent to implement independently. ## Implementation Approach Based on `examples/example_htop.py`, the system uses: - **Xvfb**: Virtual X11 display for headless operation - **xterm**: Terminal emulator within the virtual display - **xdotool**: Input simulation and window management - **ImageMagick**: PNG screenshot capture ## Core Implementation Tasks ### 1. Project Structure & Dependencies - [ ] Set up basic Python project structure with proper package organization - [ ] Configure pyproject.toml with required dependencies (FastMCP, asyncio, subprocess for system calls) - [ ] Create main module structure: `terminal_control_mcp/` with `__init__.py`, `server.py`, `xterm_session.py` - [ ] Set up basic logging configuration for debugging and monitoring ### 2. MCP Server Foundation - [ ] Implement basic FastMCP server setup in `server.py` with proper initialization - [ ] Register core MCP tools: `terminal_launch`, `terminal_input`, `terminal_capture`, `terminal_close` - [ ] Add basic error handling and logging for MCP operations - [ ] Implement server startup and shutdown procedures with proper cleanup ### 3. XTerm Session Management - [ ] Create `XTermSession` class in `xterm_session.py` based on example_htop.py pattern - [ ] Implement virtual display creation using Xvfb with configurable resolution - [ ] Add xterm process spawning with customizable commands and geometry - [ ] Implement window ID detection using xdotool search methods - [ ] Add session tracking with unique IDs and state management (active, closed, error) - [ ] Implement proper cleanup with Xvfb and xterm process termination ### 4. Core MCP Tools Implementation - [ ] Implement `terminal_launch` tool: create Xvfb display and spawn xterm with specified command - [ ] Implement `terminal_input` tool: send keyboard input using xdotool key/type commands - [ ] Implement `terminal_capture` tool: take PNG screenshots using ImageMagick import - [ ] Implement `terminal_close` tool: cleanup Xvfb and xterm processes with proper termination ### 5. Screen Capture & Visual Output - [ ] Implement PNG screenshot capture using ImageMagick import command - [ ] Add screenshot file management (temporary files, cleanup) - [ ] Create base64 encoding for returning images via MCP - [ ] Add screenshot metadata (timestamp, dimensions, file size) - [ ] Return structured screen data (image data, metadata, session info) ### 6. Input Handling - [ ] Implement xdotool-based keyboard input for alphanumeric characters using `type` command - [ ] Add support for special keys using xdotool `key` command (Return, Tab, Escape, arrows, function keys) - [ ] Handle control sequences (Ctrl+C, Ctrl+D, etc.) using xdotool key combinations - [ ] Add window focus management before sending input - [ ] Add input validation and error handling for invalid key sequences ### 7. Error Handling & Recovery - [ ] Implement comprehensive error handling for all MCP tools - [ ] Add proper exception handling for subprocess failures (Xvfb, xterm, xdotool) - [ ] Handle window ID detection failures with multiple search strategies - [ ] Create error recovery mechanisms for display and session failures - [ ] Return structured error responses with helpful debugging information ### 8. Basic Testing & Validation - [ ] Create simple test script to validate MCP server functionality - [ ] Test Xvfb display creation and xterm launching - [ ] Validate PNG screenshot capture with simple commands (ls, echo) - [ ] Test xdotool input simulation with various key types - [ ] Test session cleanup and resource deallocation - [ ] Validate error handling with invalid commands and display failures ## MVP Success Criteria The MVP should demonstrate: - [ ] Launch a virtual X11 display and xterm session with specified commands - [ ] Send keyboard input using xdotool to interact with running programs - [ ] Capture and return terminal screen content as PNG screenshots - [ ] Handle multiple concurrent virtual display sessions - [ ] Provide clear error messages for debugging display and input issues ## Notes for Implementation - **Focus on PNG screenshot capture** - Visual screenshots are the primary output format - **Use xterm + xdotool for terminal interaction** - Provides reliable X11-based terminal control - **Follow example_htop.py pattern** - Use the proven XTermSession class structure - **Handle virtual display lifecycle carefully** - Proper Xvfb startup/cleanup is critical - **Window ID detection is crucial** - Implement multiple fallback strategies for finding xterm windows - **Keep sessions simple** - No persistence or advanced state management in MVP - **Prioritize reliability over features** - Basic functionality that works consistently ## Out of Scope for MVP - Mouse event handling with xdotool - Session persistence across server restarts - OCR text extraction from screenshots - Advanced xterm configuration and theming - Performance optimization for many concurrent displays - Security sandboxing of executed commands - Multiple shell type support beyond bash - Display sharing or remote access features

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/taskhub-sh/terminal-driver-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server