Morphology Databases Post-installation ====================================== This page provides post-installation instructions for specific morphological database packages that require additional installation steps. Post-installation Steps ----------------------- .. _calima-msa-s31-db-post-install: calima-msa-s31 ^^^^^^^^^^^^^^ 1. Install the database by running ``camel_data -i morphology-db-msa-s31``. 2. Purchase a copy SAMA 3.1 from the `Linguistic Data Consortium `_. 3. Download the `SAMA 3.1 archive `_ (should be called ``LDC2010L01.tgz``). 4. Run ``camel_data -p morphology-db-msa-s31 /path/to/LDC2010L01.tgz``. Usage ----- The example below shows how we can now use *calima-msa-s31* after performing the above post-installation steps. In this case, we will be using *calima-mas-s31* to diacritize a sentence. .. code-block:: python from camel_tools.morphology.analyzer import Analyzer from camel_tools.morphology.database import MorphologyDB from camel_tools.disambig.bert import BERTUnfactoredDisambiguator # Load the calima-msa-s31 database db = MorphologyDB.builtin_db('calima-msa-s31') # Create an analyzer instance using the calima-msa-s31 database analyzer = Analyzer(db, 'ADD_PROP', cache_size=100000) # Load the pretrained MSA BERT disambiguator disambig = BERTUnfactoredDisambiguator.pretrained(model_name='msa', pretrained_cache=False) # Replace the default analyzer with the calima-msa-s31 analyzer disambig.set_analyzer(analyzer) # Disambiguate sentence sentence = 'سوف نقرأ الكتب'.split() sentence_disambig = disambig.disambiguate(sentence) # Extract diacritized words sentence_diacritized = [d.analyses[0].analysis['diac'] for d in sentence_disambig] print(' '.join(sentence_diacritized))