Introduction to the ZPar build system
Note: This documentation is written for ZPar 0.7 and above.
Overview
ZPar is built using the make build system by providing a set of predefined rules and compilation actions. You may easily customize ZPar or contribute third-party code by adapting the Makefiles.
How to compile ZPar using default implementations
Suppose that ZPar has been downloaded to the directory zpar
.
To make a POS tagging system for English,
type make english.postagger
.
This will create a directory zpar/dist/english.postagger
,
in which there are two files: train
and tagger
.
The file train
is used to train a tagging model,
and the file tagger
is used to tag new texts using a trained parsing model.
Parsers and POS taggers for other languages can be built in similar ways. Please refer to specific manuals for detailed instructions.
How to change the implementation for an existing task
ZPar provides various implementations for parsers and POS taggers for all the supported languages.
To choose a specific implementation
from existing ones provided by ZPar,
simply edit zpar/Makefile
and change the corresponding *_IMPL
macro
where *
represents
.
For example, to use the `Tweb' implementation
of the POS tagger for generic language,
make the following change to the <LANGUAGE>_<TASK>
zpar/Makefile
:
GENERIC_TAGGER_IMPL = tweb
Then build the generic POS tagger as usual by typing make generic.postagger
.
To examine which implementations are supported for a specific task, please consult the sub-folders under:
par/src/<LANGUAGE>/<TASK>/implementations/
For example, the supported Chinese word segmentors are,
bash$ ls zpar/src/chinese/segmentor/implementations/
The name of a sub-folder is also the name of the implementation.
The
can be:
<LANGUAGE>
-
: for Chinese only;chinese
-
: for English only;english
-
: for Spanish only;spanish
-
: for all of Chinese, English, Spanish and other languages.common
<TASK>
can be:
-
: for POS tagging;tagger
-
: for constituency parsing;conparser
-
: for dependency parsing;depparser
-
: for dependency labeling;deplabeler
-
: for word segmentation, Chinese only.segmentor
When the implementation of a specific task is changed, the corresponding component of ZPar will be updated accordingly, when ZPar is compiled again.
How to contribute code by writing a new implementation for a task
You may add a new implementation for a specific task by creating a sub-folder as described in How to change the implementation for an existing task.
ZPar requires all the implementations compatible with
ZPar APIs and source/object file naming conventions.
For example, to write a new generic language POS tagger,
one must provide
the following class in the source file tagger.h
in zpar/src/generic/tagger/implementations/
,
<NEW_TAGGER>
namespace TARGET_LANGUAGE {
class CTagger {
public:
CTagger(const std::string &sFeatureDBPath, bool bTrain=false);
void loadTagDictionary(const std::string &sTagDictPath);
void loadKnowledge(const std::string &sKnowledge);
bool train(const CTwoStringVector *correct);
void finishTraining();
void tag(CStringVector *sentence, CTwoStringVector *retval, int nBest=1, double *out_scores=NULL);
}
}
A tagger.cpp
file should contain the implementations
of the methods,
and be saved into the same directory.
It should be compiled into generic.postagger.o
.
In addition, the new POS tagger implementation should provide two files,
tagger_weight.h
and tagger_weight.cpp
,
which should be compiled into weight.o
.
For other tasks and languages, you may look into an existing implementation
and Makefile template for details.
If the custom implementation contains all the
source and object files according to the specifications above,
there is nothing further to do.
Simply change the corresponding *_IMPL
macro in zpar/Makefile
into the name of the custom implementation and build as usual.
The build system will automatically detect the implementation,
compile it and link it into ZPar.
However, if it is not sufficient to accomplish
the custom implementation in those files,
and additional objects are necessary,
you have to provide your own rules and actions
to compile and link the additional objects.
In such a situation, make a copy of the corresponding Makefile template
for the task and language
from zpar/Makefile.d/
folder
to your custom implementation folder:
cp Makefile.d/Makefile.<LANGUAGE>.<TASK> \
src/<LANGUAGE>/<TASK>/implementations/<IMPL_NAME>/Makefile
For example, to write a new generic language POS tagger,
cp Makefile.d/Makefile.ge.postagger \
src/common/postagger/implementations/<NEW_TAGGER>/Makefile}
And change the rules in it to compile the custom implementation.
Note that the object file naming must strictly follow
the conventions described above.
You should link all the required objects of your own into one with proper name.
For example, to write a new generic language POS tagger,
one should make sure that
all the necessary object files
are linked into generic.postagger.o
.
For details, zpar/Makefile.d/Makefile.ge.postagger.tweb
can be taken as an example.