Skip to main content
Glama
Dazlarus
by Dazlarus
README.mdβ€’25.9 kB
# ChatGPT Escalation MCP Server An MCP (Model Context Protocol) server that enables autonomous coding agents to escalate complex questions to the ChatGPT Desktop app automatically β€” **ToS-compliant** via native UI automation. **What this does:** This tool lets autonomous coding agents (Copilot, Claude, Cline, Roo, etc.) escalate hard questions to the *ChatGPT Desktop app* on your computer. It automates ChatGPT the same way a human would β€” clicking the UI, sending the question, waiting for the response, copying it β€” then returns the answer to your agent so it can continue working *without you*. > ### πŸ–₯️ Windows 10/11 Only > **This tool supports only Windows.** macOS and Linux are not supported and there are no plans to add support. > > ### ⚠️ Important Requirements > - **ChatGPT Desktop app** (Microsoft Store version) > - Automation controls your ChatGPT window β€” **don't touch it** during escalations > - Only **one escalation at a time** (requests are queued) > - UI changes in ChatGPT may break automation β€” [open an issue](../../issues) if this happens > > ### βœ… ToS Compliant > This tool only automates your *local* ChatGPT Desktop application. It does **not** automate the web UI, bypass security features, or scrape data. ## Features - **Two MCP Tools**: - `escalate_to_expert` - Send questions to ChatGPT and receive detailed responses - `list_projects` - Discover available project IDs from your configuration - **100% Accurate UI Detection** - Pixel-based detection for sidebar state and response completion - **OCR-Based Navigation** - PaddleOCR v5 for reliable text extraction and fuzzy matching - **Async Model Loading** - OCR models preload in background for faster response times - **Project Organization** - Map multiple projects to different ChatGPT conversations ## How It Works ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” MCP Protocol β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Coding Agent │◄──────────────────►│ MCP Server β”‚ β”‚ (Copilot/Roo) β”‚ β”‚ (This Project) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ spawn β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Python Driver β”‚ β”‚ (Windows) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ UI Automation β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ChatGPT Desktop β”‚ β”‚ App β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Automation Flow 1. **Kill ChatGPT** - Ensures clean state 2. **Open ChatGPT** - Fresh start 3. **Focus Window** - Bring to foreground 4. **Open Sidebar** - Click hamburger menu (pixel detection for state) 5. **Click Project** - OCR + fuzzy matching to find folder 6. **Click Conversation** - OCR + fuzzy matching to find chat (Ctrl+K fallback if not found) 7. **Focus Input** - Click text input area 8. **Send Prompt** - Paste and submit 9. **Wait for Response** - Pixel-based stop button detection 10. **Copy Response** - Robust button probing to find copy button **Automatic Retry Logic**: If any step fails, the entire flow restarts (up to 4 attempts total). Each retry gets a fresh ChatGPT instance. Most failures are transient (focus lost, window minimized) and succeed on retry. ## System Requirements | Requirement | Version | Notes | |-------------|---------|-------| | **Windows** | 10 or 11 | macOS/Linux not supported | | **ChatGPT Desktop** | Latest | Microsoft Store version | | **Node.js** | 18+ | For the MCP server | | **Python** | 3.10+ | For UI automation driver | | **GPU** | Not required | CPU-only OCR works fine | ### Python Packages ``` pywinauto # Windows UI automation pyperclip # Clipboard access paddleocr # Text recognition paddlepaddle # PaddleOCR backend ``` ### Why Windows Only? ChatGPT Desktop exposes fully accessible UI elements on Windows via UI Automation APIs. The pixel-based detection and keyboard/mouse automation work reliably on Windows. macOS has different automation APIs (Accessibility API) that would require a complete rewrite of the driver. Linux doesn't have a ChatGPT Desktop app. ### Tested Environment | Component | Version | Status | |-----------|---------|--------| | **ChatGPT Desktop** | 1.2025.112 | βœ… Tested | | **Windows 11** | 24H2 (Build 26100.2605) | βœ… Tested | | **Last Verified** | December 2, 2025 | | ### Robustness Features - **Automatic Retries**: Up to 4 attempts per escalation with intelligent failure detection - **Structured Observability**: Every escalation gets a unique `run_id` for correlation and debugging - **Error Reason Codes**: 12+ specific error codes (e.g., `focus_failed`, `project_not_found`, `empty_response`) - **Chaos Tested**: Passes aggressive chaos testing (random focus stealing, window minimization, mouse interference) - **Smart Fallbacks**: Ctrl+K search if conversation not visible in sidebar > πŸ’‘ **After ChatGPT Updates:** UI automation may break if ChatGPT significantly changes their layout. If you encounter issues after an update, please [open an issue](../../issues) with your ChatGPT version. ## Installation ### Option 1: Install from npm (Recommended) ```powershell # Install globally npm install -g chatgpt-escalation-mcp # Install Python dependencies pip install pywinauto pyperclip paddleocr paddlepaddle # Run setup wizard chatgpt-escalation-mcp init ``` ### Option 2: Install from GitHub Release 1. Download the latest release from [GitHub Releases](https://github.com/Dazlarus/chatgpt-escalation-mcp/releases) 2. Extract the ZIP file 3. Run: ```powershell cd chatgpt-escalation-mcp npm install npm run build pip install pywinauto pyperclip paddleocr paddlepaddle ``` ### Option 3: Install from Source ```powershell # Clone the repository git clone https://github.com/Dazlarus/chatgpt-escalation-mcp.git cd chatgpt-escalation-mcp # Install Node.js dependencies npm install # Build the project npm run build # Install Python dependencies pip install pywinauto pyperclip paddleocr paddlepaddle ``` ## Quick Start ### Step 1: Install ChatGPT Desktop ```powershell winget install --id=9NT1R1C2HH7J --source=msstore --accept-package-agreements --accept-source-agreements ``` Or install from the Microsoft Store: search "ChatGPT" by OpenAI. ### Step 2: Create a Conversation in ChatGPT 1. Open ChatGPT Desktop and sign in 2. Create a new **Project** (folder) called `Agent Expert Help` 3. Inside that project, create a new conversation called `Copilot Escalations` 4. Send this initial message to set the context: ``` You are an expert software architect. I'll send you technical questions from my coding agent (GitHub Copilot, Claude, etc.) when it gets stuck. For each question: 1. Analyze the problem thoroughly 2. Provide specific, actionable guidance 3. Include code examples when helpful 4. Explain WHY a solution works, not just what to do The questions will include context about what the agent already tried. ``` ### Step 3: Configure the MCP Server Create the config file at `~/.chatgpt-escalation/config.json`: ```powershell # Create config directory New-Item -ItemType Directory -Path "$env:USERPROFILE\.chatgpt-escalation" -Force # Create config file (edit the path in notepad) notepad "$env:USERPROFILE\.chatgpt-escalation\config.json" ``` Paste this configuration: ```json { "chatgpt": { "platform": "win", "responseTimeout": 600000, "projects": { "default": { "folder": "Agent Expert Help", "conversation": "Copilot Escalations" } } }, "logging": { "level": "info" } } ``` ### Step 4: Add to Your MCP Client **For VS Code with GitHub Copilot** (`%APPDATA%\Code\User\mcp.json`): ```json { "servers": { "chatgpt-escalation": { "command": "node", "args": ["N://AI Projects//chatgpt-escalation-mcp//dist//src//server.js"] } } } ``` **For Claude Desktop** (`%APPDATA%\Claude\claude_desktop_config.json`): ```json { "mcpServers": { "chatgpt-escalation": { "command": "node", "args": ["C://path//to//chatgpt-escalation-mcp//dist//src//server.js"] } } } ``` > ⚠️ Use double forward slashes `//` in paths for JSON, or escape backslashes as `\\\\` ### Step 5: Teach Your Agent When to Escalate Add escalation instructions to your agent. Choose the format that matches your tool: <details> <summary><strong>GitHub Copilot</strong> (.github/copilot-instructions.md)</summary> ```markdown ## Expert Escalation Protocol You have access to the `escalate_to_expert` MCP tool that sends questions to ChatGPT for expert guidance. ### When to Escalate - You've tried 3+ approaches without success - The problem requires specialized domain knowledge - You're unsure if the fundamental approach is correct - You're hitting consistent failure patterns you can't diagnose ### How to Escalate Use the `escalate_to_expert` tool with: - `project`: "default" (or specific project ID) - `reason`: Why you're stuck (be specific) - `question`: The technical question - `attempted`: What you already tried and results - `artifacts`: Relevant code snippets ### After Escalation Read the full response before implementing. ChatGPT often provides multiple approaches - pick the most appropriate one for the context. ``` </details> <details> <summary><strong>Cline / Roo Code</strong> (.clinerules or .roo/rules)</summary> ```markdown ## Expert Escalation Protocol You have access to the `escalate_to_expert` MCP tool. Use it when stuck. ### Escalation Triggers 1. **Accuracy plateau** - 3+ attempts with no improvement 2. **Consistent failures** - Same error pattern despite different approaches 3. **Domain gap** - Problem needs specialized knowledge you lack 4. **Architecture uncertainty** - Unsure if approach is fundamentally correct ### Before Escalating Stop and ask the user: "I've tried [X approaches] but I'm hitting [limitation]. Should I escalate to ChatGPT?" If yes, call `escalate_to_expert` with: - `project`: "default" - `reason`: Brief description of why you're stuck - `question`: Specific technical question - `attempted`: Numbered list of what you tried and results - `artifacts`: Relevant code snippets ### Question Format Structure your question clearly: - **Problem:** One sentence description - **Context:** Technical details, frameworks, constraints - **What I tried:** Numbered list with results - **Specific questions:** What you need answered ### After Response 1. Read the FULL response before implementing 2. Identify the recommended approach (there may be multiple) 3. Implement incrementally - test each suggestion 4. If unclear, ask user for clarification before proceeding ``` </details> <details> <summary><strong>OpenAI Codex CLI</strong> (AGENTS.md or instructions)</summary> ```markdown ## Expert Escalation via ChatGPT The `escalate_to_expert` MCP tool lets you ask ChatGPT for help on complex problems. ### When to Use - Multiple failed attempts on a problem - Need domain expertise (ML, systems, security, etc.) - Debugging issues that don't make sense - Architecture or design decisions ### Tool Usage ``` escalate_to_expert({ project: "default", reason: "Brief explanation of the blocker", question: "Specific technical question", attempted: "What was tried and what happened", artifacts: [{type: "file_snippet", pathOrLabel: "file.py", content: "..."}] }) ``` ### Best Practices - Be specific about what you tried and exact error messages - Include relevant code snippets in artifacts - Ask focused questions, not "help me fix this" - After receiving response, implement suggestions step by step ``` </details> <details> <summary><strong>Claude Desktop / Other MCP Clients</strong></summary> ```markdown ## Expert Escalation Protocol You have access to the `escalate_to_expert` MCP tool that sends questions to ChatGPT. ### When to Escalate - Tried 3+ approaches without success - Problem requires specialized domain knowledge - Unsure if fundamental approach is correct - Hitting consistent failure patterns ### Tool Parameters | Parameter | Required | Description | |-----------|----------|-------------| | project | Yes | Project ID (usually "default") | | reason | Yes | Why you're escalating | | question | Yes | The technical question | | attempted | No | What you tried and results | | artifacts | No | Code snippets [{type, pathOrLabel, content}] | ### After Response Read fully before implementing. Pick the most appropriate suggestion for the context. ``` </details> #### Example Escalation Call ```json { "project": "default", "reason": "Authentication flow failing silently, can't identify root cause", "question": "Why would JWT refresh tokens work in development but fail in production with no error messages?", "attempted": "1. Checked token expiry (valid), 2. Verified CORS (correct), 3. Tested with Postman (works)", "artifacts": [{"type": "file_snippet", "pathOrLabel": "auth.ts", "content": "..."}] } ``` ## Configuration Reference Config file location: `%USERPROFILE%\.chatgpt-escalation\config.json` ```json { "chatgpt": { "platform": "win", "responseTimeout": 120000, "projects": { "my-project": { "folder": "My Project Folder", "conversation": "Expert Help Chat" }, "simple-project": "Just a Conversation Title" } }, "logging": { "level": "info" } } ``` ### Project Configuration Projects can be configured two ways: **Simple** (conversation at root level in ChatGPT sidebar): ```json "project-id": "Conversation Title" ``` **With Folder** (conversation inside a ChatGPT project folder): ```json "project-id": { "folder": "Project Folder Name", "conversation": "Conversation Title" } ``` ### Multiple Projects You can map different coding projects to different ChatGPT conversations: ```json "projects": { "webapp": { "folder": "Web Projects", "conversation": "React App Help" }, "api": { "folder": "Backend Projects", "conversation": "API Design Help" }, "default": "General Coding Help" } ``` Then agents can escalate to the right context: ```json {"project": "webapp", "question": "How to optimize React re-renders?"} {"project": "api", "question": "Best practices for REST pagination?"} ``` ## MCP Tools Reference #### `escalate_to_expert` Send a question to ChatGPT via the desktop app. | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `project` | string | Yes | Project ID from config (use `list_projects` to discover) | | `reason` | string | Yes | Why you're escalating (helps ChatGPT understand context) | | `question` | string | Yes | The specific technical question | | `attempted` | string | No | What you've already tried and the results | | `projectContext` | string | No | Additional context about the codebase | | `artifacts` | array | No | Code snippets, logs, or notes (see below) | **Artifact format:** ```json { "type": "file_snippet" | "log" | "note", "pathOrLabel": "src/auth.ts", "content": "// the actual code or content" } ``` #### `list_projects` Discover available project IDs from your configuration. Call this first if you don't know what projects are available. **Returns:** ```json { "projects": ["default", "webapp", "api"], "count": 3 } ``` ## Important Notes ### ChatGPT Conversation Setup For best results, start each project's ChatGPT conversation with a system prompt that establishes the expert role: > You are the dedicated expert escalation endpoint for autonomous coding agents working on this project. > > Your role: > - Provide clear, technically correct, implementation-ready guidance. > - Assume the agent will immediately act on your instructions. > - Avoid asking the agent follow-up questions unless absolutely necessary. > - Be concise, direct, and practical. > > Response Format: > 1. Begin with a brief explanation of the issue and the recommended solution. > 2. End every response with a strict JSON object in the following format: > > { > "guidance": "one-sentence summary of what the agent should do next", > "action_plan": ["step 1", "step 2", "step 3"], > "priority": "low | medium | high", > "notes_for_user": "optional message for the human" > } > > Important Rules: > - The JSON must be the final content in your message. > - Do NOT wrap the JSON in code fences. > - Do NOT include any commentary after the JSON. > - Do NOT use placeholders or incomplete structures. > - Always return syntactically valid JSON. ### During Use - **Keep ChatGPT Desktop installed** (it will be opened/closed automatically) - **Don't interact with ChatGPT** while escalation is in progress - Automation takes ~30-120 seconds depending on response length - Works best when you're AFK or focused on other tasks ### Version Compatibility | ChatGPT Desktop Version | Status | Notes | |-------------------------|--------|-------| | 1.2025.112 | βœ… Supported | Last tested Nov 30, 2025 | | Older versions | ⚠️ Unknown | May work, not tested | | Future versions | ⚠️ Unknown | May break if UI changes significantly | If a ChatGPT update breaks automation, [open an issue](../../issues) with your version number. ## What Happens During Escalation When your agent calls `escalate_to_expert`, the server launches ChatGPT fresh, navigates to the configured conversation, sends the question, waits for completion, copies the response, and returns structured JSON β€” matching the high‑level flow diagram above. Typical time: **30–120 seconds**. For implementation details (pixel detection, OCR, copy logic), see `docs/internals-detection.md` and `docs/sidebar-selection.md`. ## Detection Internals Looking for the low‑level heuristics (sidebar state, response generation, copy button)? They’re documented for contributors in: - `docs/internals-detection.md` - `docs/sidebar-selection.md` ## Development ```powershell # Watch mode npm run dev # Build npm run build ``` ## Troubleshooting ### "ChatGPT window not found" - Make sure ChatGPT Desktop app is installed - The automation will start it automatically ### "Conversation not found" - Verify the conversation title in config matches exactly - Check that the project folder name is correct - The conversation must exist before first use ### "Response timeout" - Increase `responseTimeout` in config for longer responses - Check if ChatGPT is rate-limited or experiencing issues ### OCR not working ```powershell # Reinstall PaddleOCR pip install --upgrade paddleocr paddlepaddle ``` ### Windows automation issues ```powershell # Reinstall automation dependencies pip install --upgrade pywinauto pyperclip pywin32 ``` ### Logs Logs are written to stderr and can be captured by your MCP client. Set `logging.level` to `"debug"` in config for verbose output. ### Common Driver Error: NoneType window rect If you see an error like: ``` TypeError: 'NoneType' object is not subscriptable ``` This typically means the Python driver could not find or access the ChatGPT Desktop window. Try the following: - Make sure ChatGPT Desktop is open and not minimized - Set `headless` to `false` in your config if it is `true` (some environments hide the window) - Move ChatGPT Desktop to your primary monitor and ensure it isn't occluded by other apps - Confirm the conversation and folder titles match your config exactly - Run `npm run doctor` to validate the configuration and dependencies - Re-run the MCP smoke-test: `node tools/mcp_smoke_test.js` If the issue persists, check the backend logs (stdout/stderr) for more details and open an issue with the log snippet and your ChatGPT Desktop version. ## Verification Checklist Before your first escalation, confirm: - [ ] Windows 10 or 11 - [ ] ChatGPT Desktop installed (Microsoft Store version) - [ ] ChatGPT Desktop opens and you're logged in - [ ] Created the project folder in ChatGPT (e.g., "Agent Expert Help") - [ ] Created the conversation inside that folder (e.g., "Copilot Escalations") - [ ] Conversation title in config matches **exactly** (case-sensitive) - [ ] Config file exists at `%USERPROFILE%\.chatgpt-escalation\config.json` - [ ] MCP client configured with correct path to `dist/src/server.js` - [ ] Node.js 18+ installed (`node --version`) - [ ] Python 3.10+ installed (`python --version`) - [ ] Python packages installed (`pip list | findstr pywinauto`) ## FAQ <details> <summary><strong>Can I keep working while it runs?</strong></summary> Yes, but **don't interact with the ChatGPT window**. The automation controls mouse/keyboard input to that specific window. You can use other apps normally. </details> <details> <summary><strong>Can I use this for multiple agents simultaneously?</strong></summary> No. Only one escalation at a time. If you have multiple agents, they'll queue up and be processed sequentially. </details> <details> <summary><strong>Can this escalate to multiple ChatGPT conversations?</strong></summary> Yes! Configure multiple projects in your config, each pointing to different folders/conversations. Your agent specifies which project to use. </details> <details> <summary><strong>Will this work on macOS in the future?</strong></summary> Unlikely. macOS has different automation APIs (Accessibility API) that would require a complete driver rewrite. The Windows-only scope is intentional to keep the project maintainable. </details> <details> <summary><strong>Can I use Ollama or a local LLM instead?</strong></summary> Not with this tool β€” it specifically automates the ChatGPT Desktop app. For local LLMs, use a different MCP server that calls Ollama's API directly. </details> <details> <summary><strong>How long does an escalation take?</strong></summary> Typically 30-120 seconds: - ~10s to open ChatGPT and navigate - ~5-90s for ChatGPT to generate response (depends on length) - ~5s to copy and return </details> <details> <summary><strong>The first run is really slow. Why?</strong></summary> PaddleOCR downloads its model files (~100MB) on first use. Subsequent runs are much faster, and the model preloads in the background. </details> ## Uninstall ```powershell # Remove config directory Remove-Item -Recurse -Force "$env:USERPROFILE\.chatgpt-escalation" # Remove from your MCP client config # (edit your settings.json or claude_desktop_config.json) # Optionally uninstall Python dependencies pip uninstall pywinauto pyperclip paddleocr paddlepaddle ``` ## Security This tool **never** automates anything outside the ChatGPT Desktop window. It never reads unrelated windows, captures screens of other apps, or interacts with other applications. All automation is scoped to the ChatGPT process. ## Author Created by **Darien Hardin** ([@Dazlarus](https://github.com/Dazlarus)) ## License MIT ## Changelog See [CHANGELOG.md](CHANGELOG.md) for version history. ## Additional Docs - Protocol probe usage and troubleshooting: `docs/protocol-probe.md` - Sidebar selection internals and tuning: `docs/sidebar-selection.md` - Safety guardrails and interruption recovery: `docs/safety-guardrails.md` ## Chaos / Antagonistic Testing Test safety guardrails by running commands under an antagonist that randomly steals focus, minimizes ChatGPT, moves/clicks the mouse, opens occluding windows, and scrolls. **Quick commands:** ```pwsh # Run any command with chaos (60s, medium intensity) npm run chaos -- <your-command> # Run protocol probe with aggressive chaos (90s) npm run chaos:probe # Run full escalation test under chaos (90s, aggressive) npm run chaos:escalate ``` **Customize chaos parameters:** ```pwsh # Gentle chaos for 120 seconds node tools/with_antagonist.js --duration=120 --intensity=gentle -- npm run probe # Custom duration and intensity for escalation test node tools/chaos_escalation_test.js --duration=60 --intensity=medium ``` **Intensities:** - `gentle`: Fewer disruptions, longer delays between actions - `medium`: Balanced (default) - `aggressive`: Heavy focus stealing, frequent minimize/occlude **What the antagonist does:** - Random mouse moves and clicks - Steals focus to Notepad - Opens Notepad windows on top of ChatGPT - Minimizes ChatGPT window - Random scroll events **Note:** This intentionally disrupts your desktop session. Run on non-critical environments or VMs. **Chaos escalation test (`npm run chaos:escalate`):** Runs a full end-to-end test: 1. Starts antagonist (default 90s, aggressive) 2. Connects to MCP server 3. Lists projects 4. Calls `escalate_to_expert` with a test question 5. Validates response 6. Reports pass/fail This verifies that safety guardrails successfully recover from interruptions during a real escalation flow. **Current Test Results:** - βœ… **Gentle**: Passes consistently - βœ… **Medium**: Passes consistently - βœ… **Aggressive**: Passes with retry logic (may take 2-4 attempts) **Seeded Tests**: Use `--seed=12345` for reproducible chaos patterns: ```pwsh node tools/chaos_escalation_test.js aggressive --duration=120 --seed=99999 ```

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Dazlarus/chatgpt-escalation-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server