camel_tools.utils.normalize
This submodule contains functions for normalizing Arabic text in different encodings. See Encoding Schemes for more information on encodings.
Functions
- camel_tools.utils.normalize.normalize_unicode(s, compatibility=True)
Normalize Unicode strings into their canonically composed form or (i.e. characters that can be written as a combination of unicode characters are converted to their single character form).
Note: This is essentially a call to
unicodedata.normalize()with form ‘NFC’ if compatibility is False or ‘NFKC’ if it’s True.
- camel_tools.utils.normalize.normalize_alef_maksura_ar(s)
Normalize all occurences of Alef Maksura characters to a Yeh character in an Arabic string.
- camel_tools.utils.normalize.normalize_alef_maksura_bw(s)
Normalize all occurences of Alef Maksura characters to a Yeh character in a Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_alef_maksura_safebw(s)
Normalize all occurences of Alef Maksura characters to a Yeh character in a Safe Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_alef_maksura_xmlbw(s)
Normalize all occurences of Alef Maksura characters to a Yeh character in a XML Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_alef_maksura_hsb(s)
Normalize all occurences of Alef Maksura characters to a Yeh character in a Habash-Soudi-Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_teh_marbuta_ar(s)
Normalize all occurences of Teh Marbuta characters to a Heh character in an Arabic string.
- camel_tools.utils.normalize.normalize_teh_marbuta_bw(s)
Normalize all occurences of Teh Marbuta characters to a Heh character in a Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_teh_marbuta_safebw(s)
Normalize all occurences of Teh Marbuta characters to a Heh character in a Safe Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_teh_marbuta_xmlbw(s)
Normalize all occurences of Teh Marbuta characters to a Heh character in a XML Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_teh_marbuta_hsb(s)
Normalize all occurences of Teh Marbuta characters to a Heh character in a Habash-Soudi-Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_alef_ar(s)
Normalize various Alef variations to plain a Alef character in an Arabic string.
- camel_tools.utils.normalize.normalize_alef_bw(s)
Normalize various Alef variations to plain a Alef character in a Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_alef_safebw(s)
Normalize various Alef variations to plain a Alef character in a Safe Buckwalter encoded string.
- camel_tools.utils.normalize.normalize_alef_xmlbw(s)
Normalize various Alef variations to plain a Alef character in a XML Buckwalter encoded string.