camel_tools.transliterate

Contains the Transliterator class for transliterating text using a CharMapper.

Classes

class camel_tools.utils.transliterate.Transliterator(mapper, marker='@@IGNORE@@')

A class for transliterating text using a CharMapper. This class adds the extra utility of marking individual tokens to not be transliterated. It assumes that tokens are whitespace seperated.

Parameters:
  • mapper (CharMapper) – The CharMapper instance to be used for transliteration.
  • marker (str, optional) – A string that is prefixed to all tokens that shouldn’t be transliterated. Should not contain any whitespace characters. Defaults to ‘@@IGNORE@@’.
Raises:
  • TypeError – If mapper is not a CharMapper instance or marker is not a string.
  • ValueError – If marker contains whitespace or is an empty string.
transliterate(s, strip_markers=False, ignore_markers=False)

Transliterate a given string.

Parameters:
  • s (str) – The string to transliterate.
  • strip_markers (bool, optional) – Output is stripped of markers if True, otherwise markers are kept in the output. Defaults to False.
  • ignore_markers (bool, optional) – If set to True, all text, including marked tokens are transliterated as well excluding the markers. If you would like to transliterate the markers as well, use CharMapper directly instead. Defaults to False.
Returns:

The transliteration of s with the exception of marked words.

Return type:

str