camel_transliterate

About

The camel_transliterate tool allows you to transliterate text from one form to another using one of the builtin transliteration schemes. It also allows tokens to be prefixed with a marker to indicate that they should not be transliterated.

Usage

Below is the usage information that can be generated by running camel_transliterate --help.

Usage:
    camel_transliterate (-s SCHEME | --scheme=SCHEME)
                        [-m MARKER | --marker=MARKER]
                        [-I | --ignore-markers]
                        [-S | --strip-markers]
                        [-o OUTPUT | --output=OUTPUT] [FILE]
    camel_transliterate (-l | --list)
    camel_transliterate (-v | --version)
    camel_transliterate (-h | --help)

Options:
  -s SCHEME --scheme
        Scheme used for transliteration.
  -o OUTPUT --output=OUTPUT
        Output file. If not specified, output will be printed to stdout.
  -m MARKER --marker=MARKER
        Marker used to prefix tokens not to be transliterated.
        [default: @@IGNORE@@]
  -I --ignore-markers
        Transliterate marked words as well.
  -S --strip-markers
        Remove markers in output.
  -l --list
        Show a list of available transliteration schemes.
  -h --help
        Show this screen.
  -v --version
        Show version.

Below is a list of currently available transliteration schemes.

ar2bw            Arabic to Buckwalter
ar2safebw        Arabic to Safe Buckwalter
ar2xmlbw         Arabic to XML Buckwalter
ar2hsb           Arabic to Habash-Soudi-Buckwalter
bw2ar            Buckwalter to Arabic
bw2safebw        Buckwalter to Safe Buckwalter
bw2xmlbw         Buckwalter to XML Buckwalter
bw2hsb           Buckwalter to Habash-Soudi-Buckwalter
safebw2ar        Safe Buckwalter to Arabic
safebw2bw        Safe Buckwalter to Buckwalter
safebw2xmlbw     Safe Buckwalter to XML Buckwalter
safebw2hsb       Safe Buckwalter to Habash-Soudi-Buckwalter
xmlbw2ar         XML Buckwalter to Arabic
xmlbw2bw         XML Buckwalter to Buckwalter
xmlbw2safebw     XML Buckwalter to Safe Buckwalter
xmlbw2hsb        XML Buckwalter to Habash-Soudi-Buckwalter
hsb2ar           Habash-Soudi-Buckwalter to Arabic
hsb2bw           Habash-Soudi-Buckwalter to Buckwalter
hsb2safebw       Habash-Soudi-Buckwalter to Safe Buckwalter
hsb2xmlbw        Habash-Soudi-Buckwalter to Habash-Soudi-Buckwalter

Notes on markers

A marker a string with no whitespace characters at the beginning, middle, or end of it (in otherwords, it’s a single token without padding spaces). As a rule-of-thumb pick a marker that is not-likely to appear in your text. We use @@IGNORE@@ as a default value, while some Arabic NLP tools use @@LAT@@ to denote latin/foreign text.

Notes on schemes

The transliteration schemes ar2bw, ar2safebw, ar2xmlbw, ar2hsb, bw2ar, bw2safebw, bw2xmlbw, bw2hsb, safebw2ar, safebw2bw, safebw2xmlbw, safebw2hsb, xmlbw2ar, xmlbw2bw, xmlbw2safebw, xmlbw2hsb, hsb2ar, hsb2bw, hsb2safebw, and hsb2xmlbw, use the conversion table listed in Encoding Schemes. Input characters not listed in the conversion table are output as they appear without any transliteration.