Installing Treex

From the official website:

Treex (formerly TectoMT) is a highly modular NLP software system implemented in Perl programming language under Linux.

It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to significantly facilitate and accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces.

While tectogrammatical approaches to machine translation have fallen out of favor in recent years, the analytical tools provided by the Treex suite is still helpful for a variety of NLP tasks. My interest is in the ability to parse and regenerate English-language texts. (Well, and to fiddle around with TGen.)

I found that setting up Treex was nontrivial, in part because I was uncertain of how to approach a massive codebase written in Perl and get everything installed in the right place. These are the instructions that I ultimately ended up following, based on the original Dockerfile provided by the project. They assume that you are using a Linux-based system with a bash-like shell.

I hope that by the end of this tutorial you can:

  1. Install the current version of Treex.

Step 0. Install dependencies.

  • perl because Treex is written in Perl5
  • graphviz for the AI::DecisionTree Perl package
  • default-jre (Ubuntu) or java (Fedora) because the MSTperl parser is in Java
  • libboost-all-dev / boost-devel because ???
  • cmake because ???
  • git and subversion / svn for downloading code and data from UFAL

Run the following commands to install the necessary packages on Ubuntu:

apt-get install perl graphviz default-jre libboost-all-dev cmake subversion

Or on Fedora:

dnf install perl graphviz java boost-devel cmake git svn

Step 1. Set up your local Perl environment.

We’ll set up the local Perl environment using the built-in program cpan. Running cpan for the first time will trigger an interactive dialogue to configure your Perl environment.

My responses at the prompts were:

  1. yes use the auto configuration
  2. local::lib to set up my Perl libraries in a directory in my root folder (~/perl5)
  3. yes to add the relevant environment variables to my ~/.bashrc file

Step 2. Install modules necessary for treex.

cpanm YAML/Tiny.pm \
	XML::LibXML \
	Moose \
	MooseX \
	MooseX::NonMoose \
	MooseX::Getopt \
	MooseX::Role::Parameterized  \
	MooseX::Role::AttributeOverride \
	MooseX::SemiAffordanceAccessor \
	Readonly \
	File::HomeDir \
	File::ShareDir \
	File::Slurp \
	File::chdir \
	YAML \
	LWP::Simple \
	String::Util \
	PerlIO::gzip \
	Class::Std

Install PerlIO:Util packages separately because PerlIO:Util has bug in its tests.

cpanm -n PerlIO::Util
cpanm PerlIO::via::gzip

And some packages from UFAL used by treex:

cpanm Text::Iconv Ufal::NameTag Ufal::MorphoDiTa Lingua::Interset URI::Find Cache::LRU

Install the rest of the “optional” Treex dependencies. (Note that the quotes around ‘optional’ come from their original Dockerfile; I don’t know what each package is needed for.)

cpanm \
	autodie \
	threads \
	threads::shared \
	forks \
	namespace::autoclean \
	Module::Reload \
	IO::Interactive \
	App::whichpm \
	Treex::PML \
	Cache::Memcached \
    List::Pairwise \
    Algorithm::NaiveBayes \
    AI::DecisionTree \
	Algorithm \
    Algorithm::DecisionTree \
	AnyEvent \
	AnyEvent::Fork \
	Bash::Completion::Utils \
	Carp \
	Carp::Always \
	Carp::Assert \
	Clone \
	Compress::Zlib \
	DBI \
	DateTime \
	EV \
	Email::Find \
	Encode::Arabic \
	Frontier::Client \
	Graph \
	Graph::ChuLiuEdmonds \
	Graph::MaxFlow \
	HTML::FormatText \
	JSON \
	Lingua::EN::Tagger \
	Modern::Perl \
	MooseX::ClassAttribute \
	MooseX::FollowPBP \
	MooseX::Types::Moose \
	PML \
	POE \
	String::Diff \
	Test::Files \
	Test::Output \
	Text::Brew \
	Text::JaroWinkler \
	Text::Table \
	Text::Unidecode \
	Tk \
	Tree::Trie \
	URL::Encode \
	XML::Simple

When they built their docker image, one of the tests for this package failed, so we handle it separately like they did.

cpanm -n  AI::MaxEntropy

3. Set up your directories for treex and the rest of set up.

If you have a favorite projects directory, create a directory called tectomt inside that directory. Otherwise, just create tectomt in your home directory as follows:

mkdir ~/tectomt

We’ll call this directory TMT_ROOT throughout the rest of this text, so let’s export it to an environment variable now.

export TMT_ROOT=~/tectomt

(Note that you need to replace ~/tectomt with the path to the tectomt directory you just created, if it is different.)

Now we can go to that directory and clone the treex repository:

cd $TMT_ROOT
git clone https://github.com/ufal/treex.git

This creates a few more directories for us that we want to have associated with various environment variables, so we need to run the following:

export TREEX_ROOT="${TMT_ROOT}/treex"
export PATH="${TREEX_ROOT}/bin:$PATH"
export PERL5LIB="${TREEX_ROOT}/lib:$PERL5LIB"
export PERLLIB=$PERL5LIB

In addition to the perl5 directory created in our home directory earlier, we need a hidden directory .treex for the different treex tools.

mkdir -p ~/.treex/share/installed_tools
ln -s ~/.treex/share $TMT_ROOT/share

And we also want a temporary directory to use when installing additional tools

mkdir $TMT_ROOT/tmp

4. Install additional tools.

Install the Morce tagger.

cd $TMT_ROOT
svn --username public --password public export https://svn.ms.mff.cuni.cz/svn/tectomt_devel/trunk/libs/packaged tmp/packaged
cd tmp/packaged/Morce-English
perl Build.PL
./Build
./Build install

Now we need to download some models for the tagger:

mkdir -p ~/.treex/share/data/models/morce/en
cd $TMT_ROOT/share/data/models/morce/en
wget http://ufallab.ms.mff.cuni.cz/tectomt/share/data/models/morce/en/morce.alph http://ufallab.ms.mff.cuni.cz/tectomt/share/data/models/morce/en/morce.dct http://ufallab.ms.mff.cuni.cz/tectomt/share/data/models/morce/en/morce.ft http://ufallab.ms.mff.cuni.cz/tectomt/share/data/models/morce/en/morce.ftrs http://ufallab.ms.mff.cuni.cz/tectomt/share/data/models/morce/en/tags_for_form-from_wsj.dat

Install NADA.

cd $TMT_ROOT
rm -rf tmp/tool_installation
svn --username public --password public export https://svn.ms.mff.cuni.cz/svn/tectomt_devel/trunk/install/tool_installation tmp/tool_installation
cd tmp/tool_installation/NADA
perl Makefile.PL
make
make install

5. Test your newly installed instance of treex!

This is a simple ‘Hello World’ test.

echo 'Hello world' | treex -Len Read::Sentences Write::Sentences

Related