camel_tools.disambig.bert

Classes

Examples

Below is an example of how to load and use the default pre-trained CAMeLBERT based model to disambiguate words in a sentence.

from camel_tools.disambig.bert import BERTUnfactoredDisambiguator

unfactored = BERTUnfactoredDisambiguator.pretrained()

# We expect a sentence to be whitespace/punctuation tokenized beforehand.
# We provide a simple whitespace and punctuation tokenizer as part of camel_tools.
# See camel_tools.tokenizers.word.simple_word_tokenize.
sentence = ['سوف', 'نقرأ', 'الكتب']

disambig = unfactored.disambiguate(sentence)

# Let's, for example, use the top disambiguations to generate a diacritized
# version of the above sentence.
# Note that, in practice, you'll need to make sure that each word has a
# non-zero list of analyses.
diacritized = [d.analyses[0].analysis['diac'] for d in disambig]
print(' '.join(diacritized))