camel_tools.utils.normalize¶
This submodule contains functions for normalizing Arabic text in different encodings. See Encoding Schemes for more information on encodings.
Functions¶
-
camel_tools.utils.normalize.normalize_unicode(s, compatibility=True)¶ Normalize Unicode strings into their canonically composed form or (i.e. characters that can be written as a combination of unicode characters are converted to their single character form).
Note: This is essentially a call to
unicodedata.normalize()with form ‘NFC’ if compatibility is False or ‘NFKC’ if it’s True.Parameters: Returns: The normalized string.
Return type:
-
camel_tools.utils.normalize.normalize_alef_maksura_ar(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in an Arabic string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_alef_maksura_bw(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in a Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_alef_maksura_safebw(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in a Safe Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_alef_maksura_xmlbw(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in a XML Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_alef_maksura_hsb(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in a Habash-Soudi-Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_teh_marbuta_ar(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in an Arabic string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_teh_marbuta_bw(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in a Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_teh_marbuta_safebw(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in a Safe Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_teh_marbuta_xmlbw(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in a XML Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_teh_marbuta_hsb(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in a Habash-Soudi-Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_alef_ar(s)¶ Normalize various Alef variations to plain a Alef character in an Arabic string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_alef_bw(s)¶ Normalize various Alef variations to plain a Alef character in a Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.normalize_alef_safebw(s)¶ Normalize various Alef variations to plain a Alef character in a Safe Buckwalter encoded string.
Parameters: s ( str) – The string to be normalized.Returns: The normalized string. Return type: str