camel_tools.ner

This module contains the CAMeL Tools Named Entity Recognition component.

Classes

class camel_tools.ner.NERecognizer(model_path, use_gpu=True)

CAMeL Tools NER component.

Parameters:

model_path (str) – The path to the fine-tuned model.
use_gpu (bool, optional) – The flag to use a GPU or not. Defaults to True.

static labels()

Get the list of NER labels returned by predictions.

Returns:: List of NER labels.
Return type:: list of str

predict(sentences, batch_size=32)

Predict the named entity labels of a list of sentences.

Parameters:

sentences (list of list of str) – The input sentences.
batch_size (int) – The batch size.

Returns:

The predicted named entity labels for the given sentences.

Return type:

list of list of str

predict_sentence(sentence)

Predict the named entity labels of a single sentence.

Parameters:: sentence (list of str) – The input sentence.
Returns:: The predicted named entity labels for the given sentence.
Return type:: list of str

static pretrained(model_name=None, use_gpu=True)

Load a pre-trained model provided with camel_tools.

Parameters:

model_name (str, optional) – Name of pre-trained model to load. One model is available: ‘arabert’. If None, the default model (‘arabert’) will be loaded. Defaults to None.
use_gpu (bool, optional) – The flag to use a GPU or not. Defaults to True.

Returns:

Instance with loaded pre-trained model.

Return type:

NERecognizer

Examples

Below is an example of how to load and use the default pre-trained model.

from camel_tools.ner import NERecognizer

ner = NERecognizer.pretrained()

# Predict the labels of a single sentence.
# The sentence must be pretokenized by whitespace and punctuation.
sentence = 'إمارة أبوظبي هي إحدى إمارات دولة الإمارات العربية المتحدة السبع .'.split()
labels = ner.predict_sentence(sentence)

# Print the list of token-label pairs
print(list(zip(sentence, labels)))