Getting Started

Installation

You will need Python 3.6 and above (64-bit).

Linux/macOS

Install using pip

pip install camel-tools

# or run the following if you already have camel_tools installed
pip install camel-tools --upgrade --force-reinstall

Install from source

# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools

# Install from source
pip install .

# or run the following if you already have camel_tools installed
pip install --upgrade --force-reinstall .

Installing data

First, download either the Full data zip or the Light data zip (see Datasets for a comparison).

Unzip the file and then move and rename the unzipped directory to ~/.camel_tools. If installed correctly, there should be a direct path to ~/.camel_tools/data.

Alternatively, if you would like to install the data in a different location, you need to set the CAMELTOOLS_DATA environment variable to the desired path.

Add the following to your .bashrc, .zshrc, .profile, etc:

export CAMELTOOLS_DATA=/path/to/camel_tools_data

Again, data should be a subdirectory of the path set in CAMELTOOLS_DATA.

Windows

Note: CAMeL Tools has been tested on Windows 10. The Dialect Identification component is not available on Windows at this time.

Install using pip

pip install camel-tools -f https://download.pytorch.org/whl/torch_stable.html

# or run the following if you already have camel_tools installed
pip install --upgrade --force-reinstall -f https://download.pytorch.org/whl/torch_stable.html camel-tools

Install from source

# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools

# Install from source
pip install -f https://download.pytorch.org/whl/torch_stable.html .
pip install --upgrade --force-reinstall -f https://download.pytorch.org/whl/torch_stable.html .

Installing data

First, download either the Full data zip or the Light data zip (see Datasets for a comparison).

Unzip the file and then move and rename the unzipped directory to C:\Users\your_user_name\AppData\Roaming\camel_tools. If installed correctly, there should be a direct path to C:\Users\your_user_name\AppData\Roaming\camel_tools\data.

Alternatively, if you would like to install the data in a different location, you need to set the CAMELTOOLS_DATA environment variable to the desired path. Below are the instructions to do so (on Windows 10):

  • Press the Windows button and type env.
  • Click on Edit the system environment variables (Control panel).
  • Click on the Environment Variables… button.
  • Click on the New… button under the User variables panel.
  • Type CAMELTOOLS_DATA in the Variable name input box and the desired data path in Variable value. Alternatively, you can browse for the data directory by clicking on the Browse Directory… button.
  • Click OK on all the opened windows.

Again, data should be a subdirectory of the path set in CAMELTOOLS_DATA.

Datasets

We provide two data distributions for use with CAMeL Tools: Full and Light.

While the Full archive provides data for all components in CAMeL Tools, the Light archive contains data for use with the morphological analyzer, the MLE Disambiguator, and any other components that depend on them only.

Below is a table comparing the feature set included in each release.

  Full Light
Size 1.8 GB 19 MB
Morphology
Disambiguation
Taggers
Tokenization
Dialect Identification  
Sentiment Analysis  
Named Entity Recognition  

Next Steps

See Command-line Tools for information on using the command-line tools or Python API Reference for information on using the Python API.