camel_tools.morphology.analyzer
The morphological analyzer component of CAMeL Tools.
Globals
Classes
- class camel_tools.morphology.analyzer.AnalyzedWord(word, analyses)
A named tuple containing a word and its analyses.
- analyses
List of analyses for word. See CAMeL Morphology Features for more information on features and their values.
- class camel_tools.morphology.analyzer.Analyzer(db, backoff='NONE', norm_map=None, strict_digit=False, cache_size=0)
Morphological analyzer component.
- Parameters:
db (
MorphologyDB) – Database to use for analysis. Must be opened in analysis or reinflection mode.backoff (
str, optional) – Backoff mode. Can be one of the following: ‘NONE’, ‘NOAN_ALL’, ‘NOAN_PROP’, ‘ADD_ALL’, or ‘ADD_PROP’. Defaults to ‘NONE’.norm_map (
CharMapper, optional) – Character map for normalizing input words. If set to None, thenDEFAULT_NORMALIZE_MAPis used. Defaults to None.strict_digit (
bool, optional) – If set to True, then only words completely comprised of digits are considered numbers, otherwise, all words containing a digit are considered numbers. Defaults to False.cache_size (
int, optional) – If greater than zero, then the analyzer will cache the analyses for the cache_Size most frequent words, otherwise no analyses will be cached.
- Raises:
AnalyzerError – If database is not an instance of (
MorphologyDB), if db does not support analysis, or if backoff is not a valid backoff mode.
- all_feats()
Return a set of all features provided by the database used in this analyzer instance.
- analyze(word)
Analyze a given word.
- Parameters:
word (
str) – Word to analyze.- Returns:
The list of analyses for word. See CAMeL Morphology Features for more information on features and their values.
- Return type:
- analyze_words(words)
Analyze a list of words.
- Parameters:
- Returns:
The list of analyses for each word in words.
- Return type:
listofAnalyzedWord
Examples
from camel_tools.morphology.database import MorphologyDB
from camel_tools.morphology.analyzer import Analyzer
db = MorphologyDB.builtin_db()
# Create analyzer with no backoff
analyzer = Analyzer(db)
# Create analyzer with NOAN_PROP backoff
analyzer = Analyzer(db, 'NOAN_PROP')
# or
analyzer = Analyzer(db, backoff='NOAN_PROP')
# To analyze a word, we can use the analyze() method
analyses = analyzer.analyze('شارع')