Skip to main content
Glama

Terminal Control MCP

by taskhub-sh
Apache 2.0
  • Linux

Terminal Control MCP

A Model Context Protocol (MCP) server that enables AI agents to interact with terminal-based TUI applications through a virtual X11 display approach.

Overview

This project provides a comprehensive solution for controlling terminal applications programmatically using:

  • Xvfb for headless virtual X11 display
  • xterm for terminal emulation
  • xdotool for input simulation and window management
  • ImageMagick for PNG screenshot capture

The system captures actual visual terminal output as PNG screenshots, making it ideal for AI agents that need to see and interact with terminal applications.

Features

  • Virtual Display Management: Headless X11 display using Xvfb
  • Input Simulation: Send keyboard input and text to terminal applications
  • Screenshot Capture: Take PNG screenshots of terminal output
  • Window Management: Reliable window detection and focus handling
  • Resource Cleanup: Proper process management with timeout handling

System Requirements

The following system packages must be installed:

# Ubuntu/Debian sudo apt-get install xvfb xterm xdotool imagemagick # CentOS/RHEL/Fedora sudo yum install xorg-x11-server-Xvfb xterm xdotool ImageMagick

Installation

This project uses the uv package manager:

# Clone the repository git clone <repository-url> cd terminal-control-mcp # Install dependencies uv sync # Activate virtual environment source .venv/bin/activate

Quick Start

Running the Example

Try the included htop example to see the system in action:

python examples/example_htop.py

This will:

  1. Launch htop in a virtual xterm session
  2. Press F3 to open the search dialog
  3. Type "python" as a search term
  4. Capture PNG screenshots at each step
  5. Clean up all processes

Basic Usage

from examples.example_htop import XTermSession # Create a session session = XTermSession(width=1920, height=1080) try: # Start virtual display and terminal session.start_virtual_display() session.start_xterm("your-command-here") # Take a screenshot session.take_screenshot("output.png") # Send input session.send_key("F1") session.send_text("hello world") finally: session.cleanup()

Architecture

Core Components

  1. XTermSession Class: The main interface for terminal control
    • Manages Xvfb virtual display lifecycle
    • Spawns and controls xterm processes
    • Handles input simulation via xdotool
    • Captures screenshots using ImageMagick
  2. Virtual Display Approach: Unlike direct TTY manipulation, this system:
    • Creates a real X11 environment with Xvfb
    • Launches actual xterm instances
    • Captures genuine visual output as PNG files
    • Provides reliable input simulation

Key Methods

  • start_virtual_display(): Initialize Xvfb virtual display
  • start_xterm(command): Launch xterm with specified command
  • send_key(key): Send special keys (F1, Escape, etc.)
  • send_text(text): Send alphanumeric text input
  • take_screenshot(filename): Capture PNG screenshot
  • cleanup(): Properly terminate all processes

Development

Project Structure

terminal-control-mcp/ ├── src/terminal_control_mcp/ # Main MCP server implementation (planned) ├── examples/ │ ├── example_htop.py # Reference implementation │ └── README.md ├── tests/ # Test suite ├── pyproject.toml # Project configuration └── CLAUDE.md # Development guidelines

Development Commands

# Run the main application python main.py # Run the htop example python examples/example_htop.py # Activate virtual environment source .venv/bin/activate

MCP Server Implementation (Planned)

The full MCP server will provide these tools:

  • terminal_launch: Start a new terminal session
  • terminal_input: Send keyboard/text input
  • terminal_capture: Take PNG screenshot
  • terminal_close: Clean up terminal session

Technical Details

Window Detection

The system uses multiple fallback strategies for reliable window ID detection:

search_methods = [ ['xdotool', 'search', '--class', 'XTerm'], ['xdotool', 'search', '--name', 'xterm'], ['xdotool', 'search', '--class', 'xterm'], ['xdotool', 'getactivewindow'] ]

Screenshot Capture

Uses ImageMagick's import command for reliable PNG capture:

subprocess.run(['import', '-window', 'root', filename], env=env)

Resource Management

Implements proper cleanup with timeout handling:

def cleanup(self): if self.xterm_proc: self.xterm_proc.terminate() try: self.xterm_proc.wait(timeout=5) except subprocess.TimeoutExpired: self.xterm_proc.kill()

Contributing

[TBD]

-
security - not tested
A
license - permissive license
-
quality - not tested

Enables AI agents to interact with terminal-based TUI applications by capturing visual terminal output as PNG screenshots and simulating keyboard input through a virtual X11 display.

  1. Overview
    1. Features
      1. System Requirements
        1. Installation
          1. Quick Start
            1. Running the Example
            2. Basic Usage
          2. Architecture
            1. Core Components
            2. Key Methods
          3. Development
            1. Project Structure
            2. Development Commands
            3. MCP Server Implementation (Planned)
          4. Technical Details
            1. Window Detection
            2. Screenshot Capture
            3. Resource Management
          5. Contributing

            Related MCP Servers

            • A
              security
              A
              license
              A
              quality
              Allows AI assistants to seamlessly interact with Terminal.shop to browse products, manage shopping carts, place orders, and handle subscriptions through Terminal.shop's API.
              Last updated -
              19
              248
              1
              JavaScript
              MIT License
            • A
              security
              F
              license
              A
              quality
              Enables AI tools to capture and process screenshots of a user's screen, allowing AI assistants to see and analyze what the user is looking at through a simple MCP interface.
              Last updated -
              1
              12
              Python
              • Linux
              • Apple
            • -
              security
              A
              license
              -
              quality
              Allows AI assistants to safely execute common Unix/macOS terminal commands through a controlled interface with comprehensive security features.
              Last updated -
              4
              Python
              MIT License
            • -
              security
              F
              license
              -
              quality
              Enables AI assistants to control a browser through a set of tools, allowing them to perform web automation tasks like navigation, typing, clicking, and taking screenshots.
              Last updated -
              TypeScript

            View all related MCP servers

            MCP directory API

            We provide all the information about MCP servers via our MCP API.

            curl -X GET 'https://glama.ai/api/mcp/v1/servers/taskhub-sh/terminal-driver-mcp'

            If you have feedback or need assistance with the MCP directory API, please join our Discord server