What can you do with this server?

The GLM Subagent MCP server lets you delegate coding and text tasks to a cheaper GLM model (~10x cheaper than Claude Opus), while your main agent handles orchestration and review. * glm_agent — Run GLM as a full coding agent in your repo with file tools (read, write, edit, list directories, bash commands). Supports dry_run mode to preview diffs without writing files, returns usage stats, and prints a git-checkpoint revert line after every real run. * glm_delegate — Pure text generation via GLM (no file access). Useful for writing code snippets, docs, or any text task. Supports optional system prompt, context injection, reasoning mode, and output format control. * glm_recommend — Free local advisory (no API call). Recommends whether to use GLM or the main model based on task complexity, context size, sensitivity, vision input, and more. Returns a recommendation, suggested model, confidence level, and reasons. * glm_status — Free local status check (no API call). Shows current peak window, active model, cumulative usage ledger totals (usage.jsonl), and config health. Key features: * Cost savings: GLM tokens are ~10x cheaper; main model only pays for orchestration and review. * Peak-aware routing: Automatically selects cheaper models during China peak hours (14–18 UTC+8). * Safety: Dry-run mode, git revert lines, key isolation, and data residency warnings. * Integration: Works with Claude Code, GitHub Copilot, VS Code, Cursor, Windsurf, Claude Desktop, and any MCP client. Can be run via npm or Docker.

How do I use GLM Subagent MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@GLM Subagent MCP Ask glm to review and fix the bug in server.js" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

GLM Subagent MCP

by djerok

Overview Schema Related Servers Score Discussions

JavaScript

Local

glm-mcp — GLM as a cheap delegate for your AI coding agent

GLM (Zhipu / Z.ai) as a ~10x cheaper delegate for your AI coding agent. Your expensive main model — Claude Opus, Copilot's default, or Codex — orchestrates and reviews; GLM does the actual work, billed on cheap GLM tokens. GLM exposes an Anthropic-compatible /v1/messages endpoint, so it drops into anything that already speaks Anthropic. This repo wraps it as an MCP server with four tools, plus one-command installers for Claude Code, GitHub Copilot, and Codex. The same server powers every edition.

How it works

flowchart TD
    You["You"]
    Main["Main agent (Claude Opus / Copilot / Codex)<br/>orchestrates + reviews"]
    Srv["glm MCP server (stdio)<br/>4 tools"]
    Rt["Router<br/>peak-aware model pick + cost bias"]
    Zai[/"Z.ai Anthropic endpoint<br/>POST /v1/messages"/]
    Loop["glm_agent tool loop<br/>read_file / write_file / edit_file<br/>list_dir / run_bash — on your repo"]
    Led[("usage.jsonl<br/>every GLM call: model + tokens")]

    Repo[("your repo")]

    You --> Main -->|"glm_agent(task, workdir)"| Srv
    Srv --> Rt --> Zai
    Zai -->|"tool calls"| Loop
    Loop -->|"tool results"| Zai
    Loop -->|"reads / writes / runs"| Repo
    Zai --> Led
    Srv -->|"summary + GLM STATS<br/>(model, tokens, est. cost)"| Main
    Main -->|"review · diff · revert"| You

Plain-English walkthrough:

You ask the main agent for work.
The main agent delegates via glm_agent — it passes a goal plus an absolute workdir.
The server's router picks a GLM model (peak-aware) and calls the Z.ai /v1/messages endpoint; the cost bias keeps GLM the default.
GLM runs its own agent loop (read_file / write_file / edit_file / list_dir / run_bash) directly against your repo, then stops with a summary.
The server returns a concise summary + a GLM STATS block (model, tokens, est. cost) to the main agent.
The main agent reviews; every GLM call is also appended to the usage.jsonl ledger.

Token economics. Delegated work bills GLM tokens (~10x cheaper). The main model only pays for orchestration + review. A near-100% GLM share requires the full-GLM launcher (claude/glm-code.mjs), because a hybrid main agent always carries per-turn session context — that context is the floor on its token share.

Related MCP server: GLM-4.6 MCP Server

The four tools

Tool	Cost	What it does
`glm_agent`	GLM tokens	GLM as a real coding agent in your repo (read/write/edit/run). `dry_run: true` previews a diff and writes nothing; after a real run a git-checkpoint revert line is printed.
`glm_delegate`	GLM tokens (opt-in)	Pure text generation — text in, text out. Hidden by default (`glm_agent` handles text-only tasks too); set `GLM_DELEGATE=on` to expose it.
`glm_recommend`	free (local)	GLM-vs-main-model advisory: which engine, which GLM model, confidence, and reasons. No GLM call.
`glm_status`	free (local)	Peak window, active model, usage-ledger totals (proof of GLM spend), and config health. No GLM call.

Live progress. glm_agent streams MCP progress notifications while it runs — current iteration, token count, and tok/s — shown live in Claude Code and mapped to tool.execution_progress in VS Code Copilot. This heartbeat also keeps long calls alive on clients that reset their timeout on progress, and cancelling a run stops GLM promptly (partial changes are shown and revertable). max_tokens defaults to auto (uncapped/generous; the orchestrating agent may pass a number to cap a call). The server uses an idle/stall timeout (GLM_STALL_TIMEOUT_MS, 2 min), so an actively-streaming turn is never cut off. If a very long run is still cancelled by your client's tool-call timeout, raise it with MCP_TOOL_TIMEOUT / CLAUDE_CODE_MCP_TOOL_IDLE_TIMEOUT.

Install

(a) Claude Code

npx glm-mcp-claude --key YOUR_ZAI_KEY

Installs globally by default (user-scoped): the MCP server, a full-tool glm subagent, a PreToolUse auto-routing hook, and an optional glm-code full-GLM launcher. Restart Claude Code, then run glm_status to confirm api_key_loaded: true. Full details: claude/README.md.

(b) GitHub Copilot / VS Code

npx glm-mcp-copilot --key YOUR_ZAI_KEY            # current workspace
npx glm-mcp-copilot --global --key YOUR_ZAI_KEY   # every workspace

Installs the MCP server in agent mode, a GLM custom agent (subagent), a PreToolUse auto-routing hook, and delegation instructions files. Reload the VS Code window, open Copilot Chat in Agent mode, start the glm server. Full details: copilot/README.md.

(c) Codex

Install the published Codex package:

npx glm-mcp-codex --key YOUR_ZAI_KEY

Installs a Codex MCP registration, a glm custom agent, the glm-delegate skill, and an advisory UserPromptSubmit/PreToolUse hook. The config gives GLM tools a 30-minute timeout and prompts before mutating calls. Restart Codex, review the hook with /hooks, and run glm_status. Full details: codex/README.md.

(d) Any MCP client / Glama / Docker

The standalone glm-mcp package — no installer needed for Cursor, Windsurf, Claude Desktop, Glama, etc.:

{
  "mcpServers": {
    "glm": {
      "command": "npx",
      "args": ["-y", "glm-mcp"],
      "env": { "GLM_API_KEY": "YOUR_ZAI_KEY" }
    }
  }
}

For containers, the repo-root Dockerfile runs the same server:

docker build -t glm-mcp .
docker run --rm -i -e GLM_API_KEY=YOUR_ZAI_KEY glm-mcp

The server boots and answers MCP introspection without a key — set GLM_API_KEY only for actual GLM calls.

Editions at a glance

	Claude Code -> `claude/`	GitHub Copilot (VS Code) -> `copilot/`	Codex -> `codex/`
npm package	`glm-mcp-claude`	`glm-mcp-copilot`	`glm-mcp-codex`
Install	`npx glm-mcp-claude --key ...`	`npx glm-mcp-copilot --key ...` (+ `--global`)	`npx glm-mcp-codex --key ...`
MCP server	user-scoped (`claude mcp add glm -s user`)	VS Code agent mode (`mcp.json`)	`~/.codex/config.toml` (or trusted project config)
Subagent	`glm` subagent (`~/.claude/agents/glm.md`)	`GLM` custom agent (`glm.agent.md`)	`glm` custom agent (`~/.codex/agents/glm.toml`)
Auto-routing hook	PreToolUse, `Task` matcher (`glm_subagent_router.mjs`)	PreToolUse, fires on all calls (`glm_router_hook.mjs`)	UserPromptSubmit + PreToolUse, advisory only
Delegation policy	appended to `~/.claude/CLAUDE.md`	`.instructions.md` files	`glm-delegate` skill + optional `AGENTS.md` snippet
Full-GLM launcher	`glm-code.mjs` (Claude only)	—
Docs	claude/README.md	copilot/README.md	codex/README.md

Parity. All three editions expose the same four tools and a subagent while using the same server underneath. Codex uses its native custom-agent, skill, and hook surfaces; its hook is advisory only and must be trusted by the user. Only Claude ships the standalone glm-code full-GLM launcher.

Configuration

All knobs live in .env (git-ignored). Location per edition: Claude ~/.claude/glm-mcp/.env (set during install); Copilot ~/.glm-mcp/glm-mcp/.env; Codex ~/.codex/glm-mcp/.env. Codex sets tool_timeout_sec = 1800 because its default MCP tool timeout is 60 seconds. Full reference with comments: claude/glm-mcp/.env.example.

Var	Default	Meaning
`GLM_API_KEY`	—	Your Z.ai / Zhipu GLM Coding Plan key. Required for GLM calls.
`GLM_BASE_URL`	`https://api.z.ai/api/anthropic`	Anthropic-compatible endpoint (`/v1/messages`).
`GLM_USE_HAIKU`	`off`	`off` calls GLM directly so all tokens stay on GLM; `on` allows the Haiku-orchestrated subagent (spends some Claude tokens).
`GLM_COST_BIAS`	`7`	How hard to favor GLM. `7` ≈ 98–100% of eligible tasks to GLM. Lower (e.g. `1.5`) to send more hard tasks to the main model; `0` = capability only.
`GLM_MAX_CONCURRENT`	`1`	GLM caps in-flight requests (~1); keep at 1 unless your tier allows more.
`GLM_CAP`	`off`	`off` = generous (up to `GLM_MAX_TOKENS_CEILING`); `on` enforces `GLM_MAX_TOKENS`.
`GLM_MAX_TOKENS`	`32768`	Hard per-call limit applied only when `GLM_CAP=on`.
`GLM_MAX_TOKENS_CEILING`	`131072`	Generous default used when the cap is off.
`GLM_MAX_RETRIES`	`4`	Retries on 429 / concurrency / 5xx with exponential backoff.
`GLM_TIMEOUT_MS`	`300000`	Per GLM HTTP request timeout (5 min).
`GLM_AGENT_MAX_ITERS`	`30`	Max tool-loop turns for `glm_agent` before it stops.
`GLM_AGENT_BASH_TIMEOUT_MS`	`120000`	Per-`run_bash` command timeout inside `glm_agent`.
`GLM_OFFPEAK_MODEL`	`glm-5.2`	Candidate model(s) for `auto` off-peak. Comma list allowed; router auto-picks.
`GLM_PEAK_MODEL`	`glm-5.2`	Candidate model(s) for `auto` at peak. Comma list allowed; include a no-surcharge model (e.g. `glm-4.7`) to dodge the peak tax.
`GLM_CHEAP_MODEL`	`glm-4.5-air`	The cheap model (used in the full-GLM launcher's Haiku slot).
`GLM_PEAK_START_CN`	`14`	Peak window start, China hour (UTC+8).
`GLM_PEAK_END_CN`	`18`	Peak window end (exclusive), China hour (UTC+8).

Peak-aware routing & cost

China peak window is 14:00–18:00 (UTC+8). The glm-5.x family carries a surcharge at peak (~3x peak / ~2x off-peak), so when auto lands on a glm-5.x model at peak the router routes less work to GLM; if you list a no-surcharge model (e.g. GLM_PEAK_MODEL=glm-5.2,glm-4.7) the router prefers it at peak and GLM stays fine to use. The cost bias keeps GLM the default either way — even at peak it is cheaper than the main model.

What stays on the main model: sensitive / secret code, vision input, parallel fan-out,

128K context, latency-tight loops, and heavy dependent tool-loops (the router's hard overrides).

Proof it's really GLM

usage.jsonl ledger — every GLM call is appended on disk with model + input_tokens + output_tokens. Claude: ~/.claude/glm-mcp/usage.jsonl; Copilot: ~/.glm-mcp/glm-mcp/usage.jsonl. Independent of the Z.ai dashboard.
glm_status — prints the cumulative ledger totals (calls, tokens, per-model counts).
=== GLM STATS === block — printed after every glm_agent run: model, tokens delegated, iterations, files changed, est. cost vs Opus.

If the ledger is empty, GLM was never called — the work ran on the main model.

Oversight & safety

dry_run: true on glm_agent — GLM proposes a full diff and writes nothing; approve before applying.
Git checkpoint revert line — printed after every real glm_agent run (when the workdir is a git repo), so you can undo in one command.
Key isolation — GLM_API_KEY lives only in the git-ignored .env; it is never baked into the npm packages (scripts/publish-server.mjs scans every pack for .env / usage.jsonl / node_modules and fails loudly).
Data residency — GLM traffic goes to Z.ai servers in China. Keep secrets and regulated code on the main model; the router's sensitive flag forces it there.

Development / CI

CI (see .github/workflows/ci.yml) runs: syntax checks on the server and every installer/hook/script, the keyless stdio smoke (scripts/smoke-stdio.mjs) that asserts the four-tool MCP handshake with no key on disk, a Docker introspection test (initialize piped into the built image), and an npm-pack secret scan (scripts/publish-server.mjs). PRs welcome — see CONTRIBUTING.md.

License

MIT © djerok · Canonical repo: https://github.com/djerok/glm-mcp

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

0dRelease cycle

5Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Tools

Related MCP Servers

CCGLM MCP Server
Autonomous Agents Code Execution Developer Tools
nosolosoft
F
license
-
quality
C
maintenance
Enables Claude Code (Anthropic Sonnet) to invoke Z.AI's GLM-4.6 model through a secondary Claude instance. Supports code generation, deep analysis, and general queries while maintaining file tracking and secure token management.
Last updated 2025-10-30
1
GLM-4.6 MCP Server
Developer Tools Code Analysis Autonomous Agents
bobvasic
A
license
A
quality
C
maintenance
Enables Claude to consult GLM-4.6's architectural intelligence for system design, code analysis, scalability patterns, and technical decision-making. Provides specialized tools for enterprise architecture consultation, distributed systems design, and code review through the Model Context Protocol.
Last updated 2026-02-14
5
15
MIT
Ollama MCP Server
Code Execution Autonomous Agents Developer Tools
Jadael
A
license
-
quality
D
maintenance
Enables Claude to delegate coding tasks to local Ollama models, reducing API token usage by up to 98.75% while leveraging local compute resources. Supports code generation, review, refactoring, and file analysis with Claude providing oversight and quality assurance.
Last updated 2025-10-24
520
19
AGPL 3.0
deepseek-as-subagent
Coding Agents Code Execution Autonomous Agents
PsChina
A
license
A
quality
C
maintenance
Run DeepSeek as a real sub-agent inside Claude Code / Codex CLI — not just a single LLM call. DeepSeek gets its own 7-tool agent loop (Read/Write/Edit/Bash/Glob/Grep/NotebookEdit) inside a sandboxed workspace.
Last updated 2026-07-16
2
13
MIT

View all related MCP servers

Related MCP Connectors

agentskillupdategate-mcp
Paid remote MCP for Claude Code skill update gate MCP, structured receipts, audit logs, and reviewer
Vaaya
Pay-per-call agent superpowers: media/video gen, product demos, research, GTM, scraping, compute.
Rami Code Review
AI code review for GitHub PRs with an MCP autofix loop for Claude Code and Cursor

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/djerok/glm-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server