Skip to main content
Glama

zh-dict-mcp

MCP server for Chinese figurative language lookup, backed by CC-CEDICT.

What it does: given a Chinese word or phrase, tells you whether its figurative usage has been lexicalized (recorded in the dictionary as an independent sense) or is a one-off creative expression.

Why it exists: LLMs writing Chinese dialogue, fiction, or roleplay tend to invent purple-prose figurative expressions that no real person would say (e.g., "他把心锁进铁盒里" / "墙比夜更厚"). This tool gives you an objective dictionary-backed check.


Install

Pick your MCP-aware client. Across all of them the runtime command is the same — uvx zh-dict-mcp — but the wrapping config differs.

Claude Code

claude mcp add zh-dict-mcp -- uvx zh-dict-mcp

Codex CLI

codex mcp add zh-dict-mcp -- uvx zh-dict-mcp

Or edit ~/.codex/config.toml directly:

[mcp_servers.zh-dict-mcp]
command = "uvx"
args = ["zh-dict-mcp"]

Cursor

In Cursor: Settings → MCP → Add new server (UI), or edit ~/.cursor/mcp.json:

{
  "mcpServers": {
    "zh-dict-mcp": {
      "command": "uvx",
      "args": ["zh-dict-mcp"]
    }
  }
}

Claude Desktop

Edit claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/, Windows: %APPDATA%\Claude\):

{
  "mcpServers": {
    "zh-dict-mcp": {
      "command": "uvx",
      "args": ["zh-dict-mcp"]
    }
  }
}

Restart Claude Desktop to load the server.

Windsurf / Zed / other MCP-aware clients

The JSON block above is universal — find your client's MCP config file (search for "MCP" in its settings docs) and paste it in.

With an optional project whitelist

If you have a project-level whitelist of "approved dead metaphors" the dictionary happens to miss, point the server at it:

"args": ["zh-dict-mcp", "--whitelist", "/abs/path/to/your_whitelist.yaml"]

Or set the environment variable ZH_DICT_WHITELIST=/abs/path/to/your_whitelist.yaml.


After install, the lookup_dictionary tool is exposed to your AI client. uvx pulls the package from PyPI on first run, caches it locally, then launches the stdio MCP server. No pip install needed.


Related MCP server: Enterprise MCP Server

What you get

A single MCP tool:

lookup_dictionary(word: string) → JSON

Example: lookup_dictionary("看见") returns:

{
  "word": "看见",
  "found_in_cedict": true,
  "simplified": "看见",
  "traditional": "看見",
  "pinyin": "kan4 jian4",
  "definitions": ["to see", "to catch sight of"],
  "tags": {
    "has_figurative": false,
    "is_neologism": false,
    "is_slang": false,
    "has_idiom_marker": false
  }
}

Example: lookup_dictionary("内卷") returns:

{
  "word": "内卷",
  "found_in_cedict": true,
  "definitions": [
    "(embryology) to involute; involution",
    "(neologism, attested by 2017) (of a society) to become more and more involuted..."
  ],
  "tags": { "is_neologism": true, ... }
}

Example: lookup_dictionary("锁进铁盒里") (a creative one-off) returns:

{
  "word": "锁进铁盒里",
  "found_in_cedict": false,
  "found_in_whitelist": false,
  "definitions": []
}

Use cases

  • AI-generated dialogue review: catch live metaphors LLM invents but no real speaker would use

  • AI writing lint: pipeline filter for game NPC dialogue / interactive fiction / chatbot scripts

  • Lexicalization research: check whether a figurative expression has been recorded in standard dictionaries

  • New word verification: confirm neologisms / slang with (neologism, attested by YEAR) attribution

  • Idiom / 典故 lookup: get figurative sense for idioms like "滑铁卢" → "(fig.) a defeat"


Data source

CC-CEDICT — open Chinese-English dictionary, 12.5万条目, community-maintained, weekly updates.

License: CC BY-SA 4.0. Bundled in package. See LICENSE-CC-CEDICT.

Why CC-CEDICT vs 现代汉语词典 (XDHYCD) or other sources:

Source

Coverage on AI-writing test set

Notes

chinese-xinhua (GitHub data)

46%

Heavy classical/古汉语 bias

现代汉语词典 第7版 (XDHYCD7th)

56%

Doesn't list literal compound words (放下/抓住/等等)

CC-CEDICT

~95%

Modern usage + neologisms + (fig.) / (slang) / (neologism) markers

CC-CEDICT explicitly tags figurative senses, neologisms with attestation years, slang, and idioms — exactly the structure needed for figurative-language analysis.


Optional: project whitelist

For project-specific overrides (e.g., words CC-CEDICT happens to miss):

# my_whitelist.yaml
allowed:
  - word: 凛然
    note: Standard literary usage, CC-CEDICT misses it
  - word: 头疼
    note: Override to include "annoyance" figurative sense

Pass via CLI:

{
  "mcpServers": {
    "zh-dict-mcp": {
      "command": "uvx",
      "args": ["zh-dict-mcp", "--whitelist", "/abs/path/to/my_whitelist.yaml"]
    }
  }
}

Or via env var ZH_DICT_WHITELIST=/path/to/file.yaml.

When a word is in the whitelist, the result includes "found_in_whitelist": true and the note.


Python API (no MCP needed)

Use the lookup library directly without launching a server:

from zh_dict_mcp import DictionaryLookup

lookup = DictionaryLookup()  # bundled CC-CEDICT loads in ~200ms
result = lookup.lookup("滑铁卢")

print(result.found)              # True
print(result.definitions)        # ['Waterloo (Belgium)', 'Battle of Waterloo (1815)', '(fig.) a defeat']
print(result.tags.has_figurative)  # True
print(result.pinyin)             # 'Hua2 tie3 lu2'

With custom whitelist:

from pathlib import Path
lookup = DictionaryLookup(whitelist_path=Path("my_whitelist.yaml"))

lookup.py has zero external dependencies (stdlib only). The mcp dependency is only needed for the MCP server.


Install standalone (no MCP, just Python library)

pip install zh-dict-mcp

Or with uv:

uv add zh-dict-mcp

Limitations

  • English-language definitions (CC-CEDICT is a Chinese-English dictionary). Works well with LLMs that handle cross-lingual judgment (Claude, GPT-4+, Gemini). For monolingual Chinese consumers you'd need a translation layer.

  • Sense matching is on the caller — this tool returns all senses; deciding whether the speaker's intended sense matches a returned sense is left to the LLM or human reviewer.

  • Single-word / single-phrase lookup — doesn't parse full sentences. Wrap with your own NLP layer for sentence-level work.

  • 9.4 MB data bundle — CC-CEDICT data is included in the wheel for offline use.


How it fits with broader writing-quality pipelines

This tool is one piece of a larger "AI-generated text quality" framework. Typical usage flow:

LLM generates Chinese dialogue
   ↓
Scan for figurative expressions (比喻 / 借代 / 委婉 / ...)
   ↓
For each: lookup_dictionary(expression)
   ↓
  ├── found + sense matches intent → pass
  └── not found or sense mismatch → flag for rewrite

A reference review prompt for this flow is documented in Forgewright (the project that spawned this tool).


Project status

v0.1.0 — initial release. Validated on a 39-case test set covering 6 categories (dead metaphors / live metaphors / literal words / boundary cases / idioms / neologisms) with 100% accuracy.

Bug reports and PRs welcome.

License

  • Code: MIT (see LICENSE)

  • CC-CEDICT data: CC BY-SA 4.0 (see LICENSE-CC-CEDICT)

Install Server
A
license - permissive license
A
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/outsiderrr/zh-dict-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server