Skip to main content
Glama
BenjisCollector

mcp-arabic-toolkit

mcp-arabic-toolkit

A small Model Context Protocol (MCP) server exposing practical Arabic text utilities. Built with the official mcp Python SDK (FastMCP).

Demonstrates: MCP server authoring / tool development

All tools are implemented for real -- deterministic string processing plus one clearly-labelled heuristic. The pure logic lives in arabic_tools.py (no mcp dependency), so it is independently unit-tested; server.py is a thin MCP wrapper.

Tools

Tool

Description

Example input

Example output

normalise_arabic

NFC-normalises, removes diacritics (harakat/tashkil) and tatweel, and optionally unifies letter variants (alef/yeh/teh-marbuta).

الْعَرَبِيَّةُ

العربية

strip_tashkeel

Removes only the diacritics (and, by default, the tatweel); leaves letters as-is.

كــــتاب

كتاب

transliterate

Documented, deterministic Arabic→Latin romanisation (simplified DIN 31635 / ALA-LC, ASCII digraphs).

كَتَبَ

{"transliteration": "kataba", "scheme": "din31635-simplified-ascii"}

detect_dialect

Heuristic dialect guess (Egyptian/Levantine/Gulf/Maghrebi/MSA) from marker words. Not a trained classifier — see limits below.

شو بدك هلق؟

{"dialect": "levantine", "confidence": 1.0, ...}

count_tokens

Whitespace-token count plus character and Arabic-character statistics.

مرحبا يا عالم

{"tokens": 3, "characters": 13, ...}

About detect_dialect (read this)

detect_dialect is an honest heuristic, not a machine-learning model. It counts hand-picked marker words/particles per dialect and returns the highest scorer. Known limits:

  • Only five coarse groups (Egyptian, Levantine, Gulf, Maghrebi, MSA).

  • Unreliable on short input, mixed-dialect text, and code-switching.

  • confidence is a crude ratio (winning hits / total hits), not a calibrated probability.

  • Falls back to MSA with confidence: 0.0 when no markers are found.

For production-grade detection, train a supervised classifier (e.g. fastText or a fine-tuned transformer) on a labelled corpus such as MADAR or NADI.

About transliterate

The romanisation is deterministic and documented but intentionally simple:

  • No vowel inference — short vowels are produced only from explicit harakat.

  • No context-sensitive rules — the article ال is always al- (no sun-letter assimilation), and hamzat al-wasl is not elided.

  • Shadda doubles the preceding consonant; sukun emits no vowel.

  • One-way (Arabic → Latin); not round-trippable.

Related MCP server: arabicfmt-mcp

Install

Requires Python 3.10+.

# Clone, then install the package (editable for local development):
pip install -e .

This pulls in the mcp SDK and registers a mcp-arabic-toolkit console script.

The tests themselves need only pytest (no mcp SDK):

pip install pytest

Run

# Option A: run the module directly (stdio transport)
python server.py

# Option B: run the installed console script
mcp-arabic-toolkit

Register with an MCP client

To use it from Claude Desktop (or any MCP client), add an entry to the client's MCP server config:

{
  "mcpServers": {
    "arabic-toolkit": {
      "command": "python",
      "args": ["/absolute/path/to/mcp-arabic-toolkit/server.py"]
    }
  }
}

Test

python -m pytest tests/ -v

The suite (tests/test_tools.py) imports the pure logic directly and covers every tool with concrete examples (diacritic/tatweel removal, letter unification, transliteration with and without harakat, each dialect, and token counting).

Quick local check

python -c "import arabic_tools; print(arabic_tools.normalise_arabic('الْعَرَبِيَّةُ'))"
# -> العربية

Publishing to the MCP registry

This package ships a server.json manifest compatible with the official MCP registry.

Exact metadata (server.json)

{
  "$schema": "https://static.modelcontextprotocol.io/schemas/2025-07-09/server.schema.json",
  "name": "io.github.benjiscollector/mcp-arabic-toolkit",
  "description": "MCP server exposing Arabic text utilities: normalisation, tashkeel stripping, transliteration, a heuristic dialect detector, and token counting.",
  "status": "active",
  "repository": {
    "url": "https://github.com/BenjisCollector/mcp-arabic-toolkit",
    "source": "github"
  },
  "version": "0.2.0",
  "packages": [
    {
      "registryType": "pypi",
      "registryBaseUrl": "https://pypi.org",
      "identifier": "mcp-arabic-toolkit",
      "version": "0.2.0",
      "transport": { "type": "stdio" }
    }
  ]
}

The server name uses the io.github.<owner>/<repo> namespace, which the registry verifies against GitHub ownership during publish.

Steps

  1. Build and publish the PyPI package so the registry has something to point at:

    python -m build
    twine upload dist/*
  2. Install the registry publisher CLI (mcp-publisher) — see the registry publishing guide.

  3. Authenticate with GitHub so the CLI can verify the io.github.* namespace:

    mcp-publisher login github
  4. Publish from the directory containing server.json:

    mcp-publisher publish

To list this server on the community modelcontextprotocol/servers README as well, see SUBMISSION.md for the exact entry text and PR steps.

License

MIT — see LICENSE.

Install Server
A
license - permissive license
A
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/BenjisCollector/mcp-arabic-toolkit'

If you have feedback or need assistance with the MCP directory API, please join our Discord server