Skip to main content
Glama

Terminal Control MCP

by taskhub-sh
Apache 2.0
1
  • Linux
README.md4.9 kB
# Terminal Control MCP A Model Context Protocol (MCP) server that enables AI agents to interact with terminal-based TUI applications through a virtual X11 display approach. ## Overview This project provides a comprehensive solution for controlling terminal applications programmatically using: - **Xvfb** for headless virtual X11 display - **xterm** for terminal emulation - **xdotool** for input simulation and window management - **ImageMagick** for PNG screenshot capture The system captures actual visual terminal output as PNG screenshots, making it ideal for AI agents that need to see and interact with terminal applications. ## Features - **Virtual Display Management**: Headless X11 display using Xvfb - **Input Simulation**: Send keyboard input and text to terminal applications - **Screenshot Capture**: Take PNG screenshots of terminal output - **Window Management**: Reliable window detection and focus handling - **Resource Cleanup**: Proper process management with timeout handling ## System Requirements The following system packages must be installed: ```bash # Ubuntu/Debian sudo apt-get install xvfb xterm xdotool imagemagick # CentOS/RHEL/Fedora sudo yum install xorg-x11-server-Xvfb xterm xdotool ImageMagick ``` ## Installation This project uses the `uv` package manager: ```bash # Clone the repository git clone <repository-url> cd terminal-control-mcp # Install dependencies uv sync # Activate virtual environment source .venv/bin/activate ``` ## Quick Start ### Running the Example Try the included htop example to see the system in action: ```bash python examples/example_htop.py ``` This will: 1. Launch htop in a virtual xterm session 2. Press F3 to open the search dialog 3. Type "python" as a search term 4. Capture PNG screenshots at each step 5. Clean up all processes ### Basic Usage ```python from examples.example_htop import XTermSession # Create a session session = XTermSession(width=1920, height=1080) try: # Start virtual display and terminal session.start_virtual_display() session.start_xterm("your-command-here") # Take a screenshot session.take_screenshot("output.png") # Send input session.send_key("F1") session.send_text("hello world") finally: session.cleanup() ``` ## Architecture ### Core Components 1. **XTermSession Class**: The main interface for terminal control - Manages Xvfb virtual display lifecycle - Spawns and controls xterm processes - Handles input simulation via xdotool - Captures screenshots using ImageMagick 2. **Virtual Display Approach**: Unlike direct TTY manipulation, this system: - Creates a real X11 environment with Xvfb - Launches actual xterm instances - Captures genuine visual output as PNG files - Provides reliable input simulation ### Key Methods - `start_virtual_display()`: Initialize Xvfb virtual display - `start_xterm(command)`: Launch xterm with specified command - `send_key(key)`: Send special keys (F1, Escape, etc.) - `send_text(text)`: Send alphanumeric text input - `take_screenshot(filename)`: Capture PNG screenshot - `cleanup()`: Properly terminate all processes ## Development ### Project Structure ``` terminal-control-mcp/ ├── src/terminal_control_mcp/ # Main MCP server implementation (planned) ├── examples/ │ ├── example_htop.py # Reference implementation │ └── README.md ├── tests/ # Test suite ├── pyproject.toml # Project configuration └── CLAUDE.md # Development guidelines ``` ### Development Commands ```bash # Run the main application python main.py # Run the htop example python examples/example_htop.py # Activate virtual environment source .venv/bin/activate ``` ### MCP Server Implementation (Planned) The full MCP server will provide these tools: - `terminal_launch`: Start a new terminal session - `terminal_input`: Send keyboard/text input - `terminal_capture`: Take PNG screenshot - `terminal_close`: Clean up terminal session ## Technical Details ### Window Detection The system uses multiple fallback strategies for reliable window ID detection: ```python search_methods = [ ['xdotool', 'search', '--class', 'XTerm'], ['xdotool', 'search', '--name', 'xterm'], ['xdotool', 'search', '--class', 'xterm'], ['xdotool', 'getactivewindow'] ] ``` ### Screenshot Capture Uses ImageMagick's `import` command for reliable PNG capture: ```python subprocess.run(['import', '-window', 'root', filename], env=env) ``` ### Resource Management Implements proper cleanup with timeout handling: ```python def cleanup(self): if self.xterm_proc: self.xterm_proc.terminate() try: self.xterm_proc.wait(timeout=5) except subprocess.TimeoutExpired: self.xterm_proc.kill() ``` ## Contributing [TBD]

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/taskhub-sh/terminal-driver-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server