camel_dediac ============ About ----- The ``camel_dediac`` tool allows you to dediacritize Arabic text in multiple encoding schemes. Usage ----- Below is the usage information that can be generated by running ``camel_dediac --help``. .. code-block:: none Usage: camel_dediac [-s | --scheme=] [-m | --marker=] [-I | --ignore-markers] [-S | --strip-markers] [-o OUTPUT | --output=OUTPUT] [FILE] camel_dediac (-l | --list) camel_dediac (-v | --version) camel_dediac (-h | --help) Options: -s --scheme= The encoding scheme of the input text. [default: ar] -o OUTPUT --output=OUTPUT Output file. If not specified, output will be printed to stdout. -m --marker= Marker used to prefix tokens not to be de-diacritized. [default: @@IGNORE@@] -I --ignore-markers De-diacritize words prefixed with a marker. -S --strip-markers Remove prefix markers in output if --ignore-markers is set. -l --list Show a list of available input encoding schemes. -h --help Show this screen. -v --version Show version. Below is a list of currently available encoding schemes. .. code-block:: none ar Arabic script bw Buckwalter encoding safebw Safe Buckwalter encoding xmlbw XML Buckwalter encoding hsb Habash-Soudi-Buckwalter encoding See :doc:`/reference/encoding_schemes` for more information on encodings. Notes on markers ---------------- A marker a string with no whitespace characters at the beginning, middle, or end of it (in otherwords, it's a single token without padding spaces). As a rule-of-thumb pick a marker that is not-likely to appear in your text. We use ``@@IGNORE@@`` as a default value, while some Arabic NLP tools use ``@@LAT@@`` to denote latin/foreign text.