camel_tools.morphology.database

The MorphologyDB class parses a morphology database file and generates indexes to be used by the analyzer, generator, and reinflector components. You will never have to access MorphologyDB instances directly but only pass them as arguments when creating new instances of the analyzer, generator, and reinflector components.

Classes

class camel_tools.morphology.database.MorphologyDB(fpath, flags='a')

Class providing indexes from a given morphology database file.

Parameters:
  • fpath (str) – File path to database.
  • flags (str) – Flag string (similar to opening files) indicates what components the database will be used for. ‘a’ indicates analysis, ‘g’ indicates generation, and ‘r’ indicates indicates reinflection. ‘r’ is equivalent to ‘ag’ since the reinflector uses both analyzer and generator components internally. Defaults to ‘a’.
Raises:

InvalidDatabaseFlagError – When an invalid flag value is given.

all_feats()

Return a set of all features provided by this database instance.

Returns:The set all features provided by this database instance.
Return type:frozenset of str
static builtin_db(db_name=None, flags='a')

Create a MorphologyDB instance from one of the builtin databases provided.

Parameters:
  • db_name (str, optional) – Name of builtin database. You can use list_builtin_dbs() to get a list of builtin databases or see Databases. Defaults to ‘calima-msa-r13’.
  • flags (str, optional) – Flag string to be passed to MorphologyDB constructor. Defaults to ‘a’.
Returns:

Instance of builtin database with given flags.

Return type:

MorphologyDB

static list_builtin_dbs()

Returns a list of builtin databases provided with CAMeL Tools.

Returns:List of builtin databases.
Return type:list of DatasetEntry
tok_feats()

Return a set of tokenization features provided by this database instance.

Returns:The set tokenization features provided by this database instance.
Return type:frozenset of str

Databases

Below is a list of databases that ship with CAMeL Tools:

  • calima-msa-r13 Database for analyzing Modern Standard Arabic. [1]
  • calima-egy-r13 Database for analyzing Egyptian Arabic. [2]
  • calima-glf-01 Database for analyzing Gulf Arabic. [3]

Examples

from camel_tools.morphology.database import MorphologyDB

# Initialize the default database ('calima-msa-r13')
db = MorphologyDB.builtin_db()

# In the above call, the database is loaded for analysis only by defaut.
# This is equivalent to writing:
db = MorphologyDB.builtin_db(flags='a')

# We can load it for generation as so:
db = MorphologyDB.builtin_db(flags='g')

# Or for reinflection as so:
db = MorphologyDB.builtin_db(flags='r')

# Since reinflection uses the database in both analysis and generation modes
# internally, the above is equivalent to writing:
db = MorphologyDB.builtin_db(flags='ag')


# We can initialize other builtin databases by providing the name of the
# desired database. In the examples above, we loaded the default database
# 'calima-msa-r13'. We can load other builtin databases by providing the
# desired databases name. Here we'll load the builtin Egyptian database,
# 'calima-egy-r13':
db = MorphologyDB.builtin_db('calima-egy-r13')

# Or with flags:
db = MorphologyDB.builtin_db('calima-egy-r13', flags='r')


# We can also initialize external databases:
db = MorphologyDB('/path/to/database')

# or with flags:
db = MorphologyDB('/path/to/database', flags='g')

Footnotes

[1]calima-msa-r13 is a modified version of the almor-msa-r13.db database that ships with MADAMIRA. The calima-msa-r13 database is distributed under the GNU General Public License version 2.
[2]calima-egy-r13 is a modified version of the almor-cra07.db database that ships with MADAMIRA. The calima-egy-r13 database is distributed under the GNU General Public License version 2.
[3]calima-glf-01 database is distributed under the the Creative Commons Attribution 4.0 International License.