camel_tools.utils.normalize¶
This submodule contains functions for normalizing Arabic text in different encodings. See Encoding Schemes for more information on encodings.
Functions¶
-
camel_tools.utils.normalize.
normalize_unicode
(s, compatibility=True)¶ Normalize Unicode strings into their canonically composed form or (i.e. characters that can be written as a combination of unicode characters are converted to their single character form).
Note: This is essentially a call to
unicodedata.normalize()
with form ‘NFC’ if compatibility is False or ‘NFKC’ if it’s True.Parameters: Returns: The normalized string.
Return type:
-
camel_tools.utils.normalize.
normalize_alef_maksura_ar
(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in an Arabic string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_alef_maksura_bw
(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in a Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_alef_maksura_safebw
(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in a Safe Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_alef_maksura_xmlbw
(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in a XML Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_alef_maksura_hsb
(s)¶ Normalize all occurences of Alef Maksura characters to a Yeh character in a Habash-Soudi-Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_teh_marbuta_ar
(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in an Arabic string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_teh_marbuta_bw
(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in a Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_teh_marbuta_safebw
(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in a Safe Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_teh_marbuta_xmlbw
(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in a XML Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_teh_marbuta_hsb
(s)¶ Normalize all occurences of Teh Marbuta characters to a Heh character in a Habash-Soudi-Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_alef_ar
(s)¶ Normalize various Alef variations to plain a Alef character in an Arabic string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_alef_bw
(s)¶ Normalize various Alef variations to plain a Alef character in a Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str
-
camel_tools.utils.normalize.
normalize_alef_safebw
(s)¶ Normalize various Alef variations to plain a Alef character in a Safe Buckwalter encoded string.
Parameters: s ( str
) – The string to be normalized.Returns: The normalized string. Return type: str