camel_tools.morphology.analyzer¶
The morphological analyzer component of CAMeL Tools.
Globals¶
-
camel_tools.morphology.analyzer.
DEFAULT_NORMALIZE_MAP
¶ The default character map used for normalization by
Analyzer
.Removes the tatweel/kashida character and does the following conversions:
- ‘إ’ to ‘ا’
- ‘أ’ to ‘ا’
- ‘آ’ to ‘ا’
- ‘ٱ’ to ‘ا’
- ‘ى’ to ‘ي’
- ‘ة’ to ‘ه’
Type: CharMapper
Classes¶
-
class
camel_tools.morphology.analyzer.
AnalyzedWord
¶ A named tuple containing a word and its analyses.
-
analyses
¶ List of analyses for word. See CAMeL Morphology Features for more information on features and their values.
Type: list
ofdict
-
-
class
camel_tools.morphology.analyzer.
Analyzer
(db, backoff='NONE', norm_map=None, strict_digit=False, cache_size=0)¶ Morphological analyzer component.
Parameters: - db (
MorphologyDB
) – Database to use for analysis. Must be opened in analysis or reinflection mode. - backoff (
str
, optional) – Backoff mode. Can be one of the following: ‘NONE’, ‘NOAN_ALL’, ‘NOAN_PROP’, ‘ADD_ALL’, or ‘ADD_PROP’. Defaults to ‘NONE’. - norm_map (
CharMapper
, optional) – Character map for normalizing input words. If set to None, thenDEFAULT_NORMALIZE_MAP
is used. Defaults to None. - strict_digit (
bool
, optional) – If set to True, then only words completely comprised of digits are considered numbers, otherwise, all words containing a digit are considered numbers. Defaults to False. - cache_size (
int
, optional) – If greater than zero, then the analyzer will cache the analyses for the cache_Size most frequent words, otherwise no analyses will be cached.
Raises: AnalyzerError
– If database is not an instance of (MorphologyDB
), if db does not support analysis, or if backoff is not a valid backoff mode.-
all_feats
()¶ Return a set of all features provided by the database used in this analyzer instance.
Returns: The set all features provided by the database used in this analyzer instance. Return type: frozenset
ofstr
-
analyze
(word)¶ Analyze a given word.
Parameters: word ( str
) – Word to analyze.Returns: The list of analyses for word. See CAMeL Morphology Features for more information on features and their values. Return type: list
ofdict
-
analyze_words
(words)¶ Analyze a list of words.
Parameters: words ( list
ofstr
) – List of words to analyze.Returns: The list of analyses for each word in words. Return type: list
ofAnalyzedWord
- db (
Examples¶
from camel_tools.morphology.database import MorphologyDB
from camel_tools.morphology.analyzer import Analyzer
db = MorphologyDB.builtin_db()
# Create analyzer with no backoff
analyzer = Analyzer(db)
# Create analyzer with NOAN_PROP backoff
analyzer = Analyzer(db, 'NOAN_PROP')
# or
analyzer = Analyzer(db, backoff='NOAN_PROP')
# To analyze a word, we can use the analyze() method
analyses = analyzer.analyze('شارع')