camel_tools.disambig.common

This sub-module contains common functions and classes used for disambiguation.

Classes

class camel_tools.disambig.common.ScoredAnalysis

A named tuple containing an analysis and its score.

score

The score of a given analysis.

Type:float
analysis

The analysis dictionary. See CAMeL Morphology Features for more information on features and their values.

Type:dict
class camel_tools.disambig.common.DisambiguatedWord

A named tuple containing a word and a sorted list (from high to low score) of scored analyses.

word

The word being disambiguated.

Type:str
analyses

List of scored analyses sorted from highest to lowest disambiguation score.

Type:list of ScoredAnalysis
class camel_tools.disambig.common.Disambiguator

Abstract base class that all disambiguators should implement.

all_feats()

Return a set of all features produced by this disambiguator.

Returns:The set all features produced by this disambiguator.
Return type:frozenset of str
disambiguate(sentence, top=1)

Disambiguate words in a sentence.

Parameters:
  • sentence (list of str) – list of words representing a sentence to be disambiguated.
  • top (int, optional) – The number of top analyses to return. If set to zero or less, then all analyses are returned. Defaults to 1.

Returns: list of DisambiguatedWord: List of disambiguted words in sentence.

disambiguate_word(sentence, word_ndx, top=1)

Disambiguate a word at a given index in a sentence.

Parameters:
  • sentence (list of str) – list of words representing a sentence.
  • word_ndx (int) – the index of the word in sentence to disambiguate.
  • top (int, optional) – The number of top analyses to return. If set to zero or less, then all analyses are returned. Defaults to 1.

Returns: DisambiguatedWord: The disambiguated word at index word_ndx in sentence.

tok_feats()

Return a set of tokenization features produced by this disambiguator.

Returns:The set tokenization features produced by this disambiguator.
Return type:frozenset of str