normalise_arabic
Strip Arabic text of diacritics and tatweel for consistent processing; optionally normalize letter variants.
Instructions
Normalise Arabic text by removing diacritics, tatweel, and (optionally) unifying letter variants.
Unicode-normalises to NFC.
Optionally removes Arabic diacritics (harakat / tashkil).
Optionally removes the tatweel (kashida) elongation character.
Optionally collapses alef/yeh/teh-marbuta variants (off by default; lossy).
Args: text: The Arabic (or mixed) text to normalise. strip_diacritics: Remove harakat / tashkil marks. Defaults to True. strip_tatweel: Remove the tatweel (kashida) character. Defaults to True. normalise_letters: Collapse alef/yeh/teh-marbuta variants. Default False.
Returns: The normalised text.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | ||
| strip_diacritics | No | ||
| strip_tatweel | No | ||
| normalise_letters | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |