Enables autonomous coding agents to escalate complex technical questions to ChatGPT Desktop app via UI automation, allowing agents to receive expert guidance when stuck on difficult problems.
ChatGPT Escalation MCP Server
An MCP (Model Context Protocol) server that enables autonomous coding agents to escalate complex questions to the ChatGPT Desktop app automatically — ToS-compliant via native UI automation.
What this does: This tool lets autonomous coding agents (Copilot, Claude, Cline, Roo, etc.) escalate hard questions to the ChatGPT Desktop app on your computer. It automates ChatGPT the same way a human would — clicking the UI, sending the question, waiting for the response, copying it — then returns the answer to your agent so it can continue working without you.
🖥️ Windows 10/11 Only
This tool supports only Windows. macOS and Linux are not supported and there are no plans to add support.
⚠️ Important Requirements
ChatGPT Desktop app (Microsoft Store version)
Automation controls your ChatGPT window — don't touch it during escalations
Only one escalation at a time (requests are queued)
UI changes in ChatGPT may break automation — open an issue if this happens
✅ ToS Compliant
This tool only automates your local ChatGPT Desktop application. It does not automate the web UI, bypass security features, or scrape data.
Features
Two MCP Tools:
escalate_to_expert- Send questions to ChatGPT and receive detailed responseslist_projects- Discover available project IDs from your configuration
100% Accurate UI Detection - Pixel-based detection for sidebar state and response completion
OCR-Based Navigation - PaddleOCR v5 for reliable text extraction and fuzzy matching
Async Model Loading - OCR models preload in background for faster response times
Project Organization - Map multiple projects to different ChatGPT conversations
How It Works
Automation Flow
Kill ChatGPT - Ensures clean state
Open ChatGPT - Fresh start
Focus Window - Bring to foreground
Open Sidebar - Click hamburger menu (pixel detection for state)
Click Project - OCR + fuzzy matching to find folder
Click Conversation - OCR + fuzzy matching to find chat (Ctrl+K fallback if not found)
Focus Input - Click text input area
Send Prompt - Paste and submit
Wait for Response - Pixel-based stop button detection
Copy Response - Robust button probing to find copy button
Automatic Retry Logic: If any step fails, the entire flow restarts (up to 4 attempts total). Each retry gets a fresh ChatGPT instance. Most failures are transient (focus lost, window minimized) and succeed on retry.
System Requirements
Requirement | Version | Notes |
Windows | 10 or 11 | macOS/Linux not supported |
ChatGPT Desktop | Latest | Microsoft Store version |
Node.js | 18+ | For the MCP server |
Python | 3.10+ | For UI automation driver |
GPU | Not required | CPU-only OCR works fine |
Python Packages
Why Windows Only?
ChatGPT Desktop exposes fully accessible UI elements on Windows via UI Automation APIs. The pixel-based detection and keyboard/mouse automation work reliably on Windows.
macOS has different automation APIs (Accessibility API) that would require a complete rewrite of the driver. Linux doesn't have a ChatGPT Desktop app.
Tested Environment
Component | Version | Status |
ChatGPT Desktop | 1.2025.112 | ✅ Tested |
Windows 11 | 24H2 (Build 26100.2605) | ✅ Tested |
Last Verified | December 2, 2025 |
Robustness Features
Automatic Retries: Up to 4 attempts per escalation with intelligent failure detection
Structured Observability: Every escalation gets a unique
run_idfor correlation and debuggingError Reason Codes: 12+ specific error codes (e.g.,
focus_failed,project_not_found,empty_response)Chaos Tested: Passes aggressive chaos testing (random focus stealing, window minimization, mouse interference)
Smart Fallbacks: Ctrl+K search if conversation not visible in sidebar
💡 After ChatGPT Updates: UI automation may break if ChatGPT significantly changes their layout. If you encounter issues after an update, please open an issue with your ChatGPT version.
Installation
Option 1: Install from npm (Recommended)
Option 2: Install from GitHub Release
Download the latest release from GitHub Releases
Extract the ZIP file
Run:
Option 3: Install from Source
Quick Start
Step 1: Install ChatGPT Desktop
Or install from the Microsoft Store: search "ChatGPT" by OpenAI.
Step 2: Create a Conversation in ChatGPT
Open ChatGPT Desktop and sign in
Create a new Project (folder) called
Agent Expert HelpInside that project, create a new conversation called
Copilot EscalationsSend this initial message to set the context:
Step 3: Configure the MCP Server
Create the config file at ~/.chatgpt-escalation/config.json:
Paste this configuration:
Step 4: Add to Your MCP Client
For VS Code with GitHub Copilot (%APPDATA%\Code\User\mcp.json):
For Claude Desktop (%APPDATA%\Claude\claude_desktop_config.json):
⚠️ Use double forward slashes
//in paths for JSON, or escape backslashes as\\\\
Step 5: Teach Your Agent When to Escalate
Add escalation instructions to your agent. Choose the format that matches your tool:
escalate_to_expert({ project: "default", reason: "Brief explanation of the blocker", question: "Specific technical question", attempted: "What was tried and what happened", artifacts: [{type: "file_snippet", pathOrLabel: "file.py", content: "..."}] })
Example Escalation Call
Configuration Reference
Config file location: %USERPROFILE%\.chatgpt-escalation\config.json
Project Configuration
Projects can be configured two ways:
Simple (conversation at root level in ChatGPT sidebar):
With Folder (conversation inside a ChatGPT project folder):
Multiple Projects
You can map different coding projects to different ChatGPT conversations:
Then agents can escalate to the right context:
MCP Tools Reference
escalate_to_expert
Send a question to ChatGPT via the desktop app.
Parameter | Type | Required | Description |
| string | Yes | Project ID from config (use
to discover) |
| string | Yes | Why you're escalating (helps ChatGPT understand context) |
| string | Yes | The specific technical question |
| string | No | What you've already tried and the results |
| string | No | Additional context about the codebase |
| array | No | Code snippets, logs, or notes (see below) |
Artifact format:
list_projects
Discover available project IDs from your configuration. Call this first if you don't know what projects are available.
Returns:
Important Notes
ChatGPT Conversation Setup
For best results, start each project's ChatGPT conversation with a system prompt that establishes the expert role:
You are the dedicated expert escalation endpoint for autonomous coding agents working on this project.
Your role:
Provide clear, technically correct, implementation-ready guidance.
Assume the agent will immediately act on your instructions.
Avoid asking the agent follow-up questions unless absolutely necessary.
Be concise, direct, and practical.
Response Format:
Begin with a brief explanation of the issue and the recommended solution.
End every response with a strict JSON object in the following format:
{
"guidance": "one-sentence summary of what the agent should do next",
"action_plan": ["step 1", "step 2", "step 3"],
"priority": "low | medium | high",
"notes_for_user": "optional message for the human"
}Important Rules:
The JSON must be the final content in your message.
Do NOT wrap the JSON in code fences.
Do NOT include any commentary after the JSON.
Do NOT use placeholders or incomplete structures.
Always return syntactically valid JSON.
During Use
Keep ChatGPT Desktop installed (it will be opened/closed automatically)
Don't interact with ChatGPT while escalation is in progress
Automation takes ~30-120 seconds depending on response length
Works best when you're AFK or focused on other tasks
Version Compatibility
ChatGPT Desktop Version | Status | Notes |
1.2025.112 | ✅ Supported | Last tested Nov 30, 2025 |
Older versions | ⚠️ Unknown | May work, not tested |
Future versions | ⚠️ Unknown | May break if UI changes significantly |
If a ChatGPT update breaks automation, open an issue with your version number.
What Happens During Escalation
When your agent calls escalate_to_expert, the server launches ChatGPT fresh, navigates to the configured conversation, sends the question, waits for completion, copies the response, and returns structured JSON — matching the high‑level flow diagram above. Typical time: 30–120 seconds.
For implementation details (pixel detection, OCR, copy logic), see docs/internals-detection.md and docs/sidebar-selection.md.
Detection Internals
Looking for the low‑level heuristics (sidebar state, response generation, copy button)? They’re documented for contributors in:
docs/internals-detection.mddocs/sidebar-selection.md
Development
Troubleshooting
"ChatGPT window not found"
Make sure ChatGPT Desktop app is installed
The automation will start it automatically
"Conversation not found"
Verify the conversation title in config matches exactly
Check that the project folder name is correct
The conversation must exist before first use
"Response timeout"
Increase
responseTimeoutin config for longer responsesCheck if ChatGPT is rate-limited or experiencing issues
OCR not working
Windows automation issues
Logs
Logs are written to stderr and can be captured by your MCP client. Set logging.level to "debug" in config for verbose output.
Common Driver Error: NoneType window rect
If you see an error like:
This typically means the Python driver could not find or access the ChatGPT Desktop window. Try the following:
Make sure ChatGPT Desktop is open and not minimized
Set
headlesstofalsein your config if it istrue(some environments hide the window)Move ChatGPT Desktop to your primary monitor and ensure it isn't occluded by other apps
Confirm the conversation and folder titles match your config exactly
Run
npm run doctorto validate the configuration and dependenciesRe-run the MCP smoke-test:
node tools/mcp_smoke_test.js
If the issue persists, check the backend logs (stdout/stderr) for more details and open an issue with the log snippet and your ChatGPT Desktop version.
Verification Checklist
Before your first escalation, confirm:
Windows 10 or 11
ChatGPT Desktop installed (Microsoft Store version)
ChatGPT Desktop opens and you're logged in
Created the project folder in ChatGPT (e.g., "Agent Expert Help")
Created the conversation inside that folder (e.g., "Copilot Escalations")
Conversation title in config matches exactly (case-sensitive)
Config file exists at
%USERPROFILE%\.chatgpt-escalation\config.jsonMCP client configured with correct path to
dist/src/server.jsNode.js 18+ installed (
node --version)Python 3.10+ installed (
python --version)Python packages installed (
pip list | findstr pywinauto)
FAQ
Yes, but don't interact with the ChatGPT window. The automation controls mouse/keyboard input to that specific window. You can use other apps normally.
No. Only one escalation at a time. If you have multiple agents, they'll queue up and be processed sequentially.
Yes! Configure multiple projects in your config, each pointing to different folders/conversations. Your agent specifies which project to use.
Unlikely. macOS has different automation APIs (Accessibility API) that would require a complete driver rewrite. The Windows-only scope is intentional to keep the project maintainable.
Not with this tool — it specifically automates the ChatGPT Desktop app. For local LLMs, use a different MCP server that calls Ollama's API directly.
Typically 30-120 seconds:
~10s to open ChatGPT and navigate
~5-90s for ChatGPT to generate response (depends on length)
~5s to copy and return
PaddleOCR downloads its model files (~100MB) on first use. Subsequent runs are much faster, and the model preloads in the background.
Uninstall
Security
This tool never automates anything outside the ChatGPT Desktop window. It never reads unrelated windows, captures screens of other apps, or interacts with other applications. All automation is scoped to the ChatGPT process.
Author
Created by Darien Hardin (@Dazlarus)
License
MIT
Changelog
See CHANGELOG.md for version history.
Additional Docs
Protocol probe usage and troubleshooting:
docs/protocol-probe.mdSidebar selection internals and tuning:
docs/sidebar-selection.mdSafety guardrails and interruption recovery:
docs/safety-guardrails.md
Chaos / Antagonistic Testing
Test safety guardrails by running commands under an antagonist that randomly steals focus, minimizes ChatGPT, moves/clicks the mouse, opens occluding windows, and scrolls.
Quick commands:
Customize chaos parameters:
Intensities:
gentle: Fewer disruptions, longer delays between actionsmedium: Balanced (default)aggressive: Heavy focus stealing, frequent minimize/occlude
What the antagonist does:
Random mouse moves and clicks
Steals focus to Notepad
Opens Notepad windows on top of ChatGPT
Minimizes ChatGPT window
Random scroll events
Note: This intentionally disrupts your desktop session. Run on non-critical environments or VMs.
Chaos escalation test (
Runs a full end-to-end test:
Starts antagonist (default 90s, aggressive)
Connects to MCP server
Lists projects
Calls
escalate_to_expertwith a test questionValidates response
Reports pass/fail
This verifies that safety guardrails successfully recover from interruptions during a real escalation flow.
Current Test Results:
✅ Gentle: Passes consistently
✅ Medium: Passes consistently
✅ Aggressive: Passes with retry logic (may take 2-4 attempts)
Seeded Tests: Use --seed=12345 for reproducible chaos patterns: