camel_tools.utils.dediac

This submodule contains functions for dediacritizing Arabic text in different encodings. See Encoding Schemes for more information on encodings.

Functions

camel_tools.utils.dediac.dediac_ar(s)

Dediacritize Unicode Arabic string.

Parameters:s (str) – String to dediacritize.
Returns:Dediacritized string.
Return type:str
camel_tools.utils.dediac.dediac_bw(s)

Dediacritize Buckwalter encoded string.

Parameters:s (str) – String to dediacritize.
Returns:Dediacritized string.
Return type:str
camel_tools.utils.dediac.dediac_safebw(s)

Dediacritize Safe Buckwalter encoded string.

Parameters:s (str) – String to dediacritize.
Returns:Dediacritized string.
Return type:str
camel_tools.utils.dediac.dediac_xmlbw(s)

Dediacritize XML Buckwalter encoded string.

Parameters:s (str) – String to dediacritize.
Returns:Dediacritized string.
Return type:str
camel_tools.utils.dediac.dediac_hsb(s)

Dediacritize Habash-Soudi-Buckwalter encoded string.

Parameters:s (str) – String to dediacritize.
Returns:Dediacritized string.
Return type:str

Examples

from camel_tools.utils.dediac import dediac_ar, dediac_bw

# Strings to dediacritize
sentence_ar = 'ثابِتُ الدّائِرَةِ هُوَ نِسبَةُ مُحِيطِها لِقُطرِها وَيُعرَفُ بِالثّابِتِ ط'
sentence_bw = 'vAbitu Ald~A}irapi huwa nisbapu muHiyTihA liquTrihA wayuErafu biAlv~Abiti T'

# Dediacritize
sentence_ar_dediac = dediac_ar(sentence_ar)
sentence_bw_dediac = dediac_bw(sentence_bw)

# Print results
print('Diacritized and dediacritized Arabic sentences:\n\t{}\n\t{}'.format(sentence_ar, sentence_ar_dediac))
print('Diacritized and dediacritized Buckwalter sentences:\n\t{}\n\t{}'.format(sentence_bw, sentence_bw_dediac))

This will output:

Diacritized and dediacritized Arabic sentences:
        ثابِتُ الدّائِرَةِ هُوَ نِسبَةُ مُحِيطِها لِقُطرِها وَيُعرَفُ بِالثّابِتِ ط
        ثابت الدائرة هو نسبة محيطها لقطرها ويعرف بالثابت ط
Diacritized and dediacritized Buckwalter sentences:
        vAbitu Ald~A}irapi huwa nisbapu muHiyTihA liquTrihA wayuErafu biAlv~Abiti T
        vAbt AldA}rp hw nsbp mHyThA lqTrhA wyErf bAlvAbt T