mcp-arabic-toolkit
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-arabic-toolkitNormalise the Arabic text 'الْعَرَبِيَّةُ'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mcp-arabic-toolkit
A small Model Context Protocol (MCP) server
exposing practical Arabic text utilities. Built with the official mcp Python
SDK (FastMCP).
Demonstrates: MCP server authoring / tool development
All tools are implemented for real -- deterministic string processing plus one
clearly-labelled heuristic. The pure logic lives in
arabic_tools.py (no mcp dependency), so it is
independently unit-tested; server.py is a thin MCP wrapper.
Tools
Tool | Description | Example input | Example output |
| NFC-normalises, removes diacritics (harakat/tashkil) and tatweel, and optionally unifies letter variants (alef/yeh/teh-marbuta). |
|
|
| Removes only the diacritics (and, by default, the tatweel); leaves letters as-is. |
|
|
| Documented, deterministic Arabic→Latin romanisation (simplified DIN 31635 / ALA-LC, ASCII digraphs). |
|
|
| Heuristic dialect guess (Egyptian/Levantine/Gulf/Maghrebi/MSA) from marker words. Not a trained classifier — see limits below. |
|
|
| Whitespace-token count plus character and Arabic-character statistics. |
|
|
About detect_dialect (read this)
detect_dialect is an honest heuristic, not a machine-learning model. It
counts hand-picked marker words/particles per dialect and returns the highest
scorer. Known limits:
Only five coarse groups (Egyptian, Levantine, Gulf, Maghrebi, MSA).
Unreliable on short input, mixed-dialect text, and code-switching.
confidenceis a crude ratio (winning hits / total hits), not a calibrated probability.Falls back to MSA with
confidence: 0.0when no markers are found.
For production-grade detection, train a supervised classifier (e.g. fastText or a fine-tuned transformer) on a labelled corpus such as MADAR or NADI.
About transliterate
The romanisation is deterministic and documented but intentionally simple:
No vowel inference — short vowels are produced only from explicit harakat.
No context-sensitive rules — the article
الis alwaysal-(no sun-letter assimilation), and hamzat al-wasl is not elided.Shadda doubles the preceding consonant; sukun emits no vowel.
One-way (Arabic → Latin); not round-trippable.
Related MCP server: arabicfmt-mcp
Install
Requires Python 3.10+.
# Clone, then install the package (editable for local development):
pip install -e .This pulls in the mcp SDK and registers a mcp-arabic-toolkit console script.
The tests themselves need only pytest (no mcp SDK):
pip install pytestRun
# Option A: run the module directly (stdio transport)
python server.py
# Option B: run the installed console script
mcp-arabic-toolkitRegister with an MCP client
To use it from Claude Desktop (or any MCP client), add an entry to the client's MCP server config:
{
"mcpServers": {
"arabic-toolkit": {
"command": "python",
"args": ["/absolute/path/to/mcp-arabic-toolkit/server.py"]
}
}
}Test
python -m pytest tests/ -vThe suite (tests/test_tools.py) imports the pure logic directly and covers
every tool with concrete examples (diacritic/tatweel removal, letter
unification, transliteration with and without harakat, each dialect, and token
counting).
Quick local check
python -c "import arabic_tools; print(arabic_tools.normalise_arabic('الْعَرَبِيَّةُ'))"
# -> العربيةPublishing to the MCP registry
This package ships a server.json manifest compatible with the
official MCP registry.
Exact metadata (server.json)
{
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-07-09/server.schema.json",
"name": "io.github.benjiscollector/mcp-arabic-toolkit",
"description": "MCP server exposing Arabic text utilities: normalisation, tashkeel stripping, transliteration, a heuristic dialect detector, and token counting.",
"status": "active",
"repository": {
"url": "https://github.com/BenjisCollector/mcp-arabic-toolkit",
"source": "github"
},
"version": "0.2.0",
"packages": [
{
"registryType": "pypi",
"registryBaseUrl": "https://pypi.org",
"identifier": "mcp-arabic-toolkit",
"version": "0.2.0",
"transport": { "type": "stdio" }
}
]
}The server name uses the io.github.<owner>/<repo> namespace, which the
registry verifies against GitHub ownership during publish.
Steps
Build and publish the PyPI package so the registry has something to point at:
python -m build twine upload dist/*Install the registry publisher CLI (
mcp-publisher) — see the registry publishing guide.Authenticate with GitHub so the CLI can verify the
io.github.*namespace:mcp-publisher login githubPublish from the directory containing
server.json:mcp-publisher publish
To list this server on the community modelcontextprotocol/servers README as well, see SUBMISSION.md for the exact entry text and PR steps.
License
MIT — see LICENSE.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/BenjisCollector/mcp-arabic-toolkit'
If you have feedback or need assistance with the MCP directory API, please join our Discord server