camel_tools.utils.dediac¶
This submodule contains functions for dediacritizing Arabic text in different encodings. See Encoding Schemes for more information on encodings.
Functions¶
-
camel_tools.utils.dediac.
dediac_ar
(s)¶ Dediacritize Unicode Arabic string.
Parameters: s ( str
) – String to dediacritize.Returns: Dediacritized string. Return type: str
-
camel_tools.utils.dediac.
dediac_bw
(s)¶ Dediacritize Buckwalter encoded string.
Parameters: s ( str
) – String to dediacritize.Returns: Dediacritized string. Return type: str
-
camel_tools.utils.dediac.
dediac_safebw
(s)¶ Dediacritize Safe Buckwalter encoded string.
Parameters: s ( str
) – String to dediacritize.Returns: Dediacritized string. Return type: str
Examples¶
from camel_tools.utils.dediac import dediac_ar, dediac_bw
# Strings to dediacritize
sentence_ar = 'ثابِتُ الدّائِرَةِ هُوَ نِسبَةُ مُحِيطِها لِقُطرِها وَيُعرَفُ بِالثّابِتِ ط'
sentence_bw = 'vAbitu Ald~A}irapi huwa nisbapu muHiyTihA liquTrihA wayuErafu biAlv~Abiti T'
# Dediacritize
sentence_ar_dediac = dediac_ar(sentence_ar)
sentence_bw_dediac = dediac_bw(sentence_bw)
# Print results
print('Diacritized and dediacritized Arabic sentences:\n\t{}\n\t{}'.format(sentence_ar, sentence_ar_dediac))
print('Diacritized and dediacritized Buckwalter sentences:\n\t{}\n\t{}'.format(sentence_bw, sentence_bw_dediac))
This will output:
Diacritized and dediacritized Arabic sentences:
ثابِتُ الدّائِرَةِ هُوَ نِسبَةُ مُحِيطِها لِقُطرِها وَيُعرَفُ بِالثّابِتِ ط
ثابت الدائرة هو نسبة محيطها لقطرها ويعرف بالثابت ط
Diacritized and dediacritized Buckwalter sentences:
vAbitu Ald~A}irapi huwa nisbapu muHiyTihA liquTrihA wayuErafu biAlv~Abiti T
vAbt AldA}rp hw nsbp mHyThA lqTrhA wyErf bAlvAbt T