camel_tools.utils.normalize

This submodule contains functions for normalizing Arabic text in different encodings. See Encoding Schemes for more information on encodings.

Functions

camel_tools.utils.normalize.normalize_unicode(s, compatibility=True)

Normalize Unicode strings into their canonically composed form or (i.e. characters that can be written as a combination of unicode characters are converted to their single character form).

Note: This is essentially a call to unicodedata.normalize() with form ‘NFC’ if compatibility is False or ‘NFKC’ if it’s True.

Parameters:
  • s (str) – The string to be normalized.
  • compatibility (bool, optional) – Apply compatibility decomposition. Defaults to True.
Returns:

The normalized string.

Return type:

str

camel_tools.utils.normalize.normalize_alef_maksura_ar(s)

Normalize all occurences of Alef Maksura characters to a Yeh character in an Arabic string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_alef_maksura_bw(s)

Normalize all occurences of Alef Maksura characters to a Yeh character in a Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_alef_maksura_safebw(s)

Normalize all occurences of Alef Maksura characters to a Yeh character in a Safe Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_alef_maksura_xmlbw(s)

Normalize all occurences of Alef Maksura characters to a Yeh character in a XML Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_alef_maksura_hsb(s)

Normalize all occurences of Alef Maksura characters to a Yeh character in a Habash-Soudi-Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_teh_marbuta_ar(s)

Normalize all occurences of Teh Marbuta characters to a Heh character in an Arabic string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_teh_marbuta_bw(s)

Normalize all occurences of Teh Marbuta characters to a Heh character in a Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_teh_marbuta_safebw(s)

Normalize all occurences of Teh Marbuta characters to a Heh character in a Safe Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_teh_marbuta_xmlbw(s)

Normalize all occurences of Teh Marbuta characters to a Heh character in a XML Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_teh_marbuta_hsb(s)

Normalize all occurences of Teh Marbuta characters to a Heh character in a Habash-Soudi-Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_alef_ar(s)

Normalize various Alef variations to plain a Alef character in an Arabic string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_alef_bw(s)

Normalize various Alef variations to plain a Alef character in a Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_alef_safebw(s)

Normalize various Alef variations to plain a Alef character in a Safe Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_alef_xmlbw(s)

Normalize various Alef variations to plain a Alef character in a XML Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str
camel_tools.utils.normalize.normalize_alef_hsb(s)

Normalize various Alef variations to plain a Alef character in a Habash-Soudi-Buckwalter encoded string.

Parameters:s (str) – The string to be normalized.
Returns:The normalized string.
Return type:str