Overview

ZPar is built using the make build system by providing a set of predefined rules and compilation actions. You may easily customize ZPar or contribute third-party code by adapting the Makefiles.

How to compile ZPar using default implementations

Suppose that ZPar has been downloaded to the directory zpar. To make a POS tagging system for English, type make english.postagger. This will create a directory zpar/dist/english.postagger, in which there are two files: train and tagger. The file train is used to train a tagging model, and the file tagger is used to tag new texts using a trained parsing model.

Parsers and POS taggers for other languages can be built in similar ways. Please refer to specific manuals for detailed instructions.

How to change the implementation for an existing task

ZPar provides various implementations for parsers and POS taggers for all the supported languages. To choose a specific implementation from existing ones provided by ZPar, simply edit zpar/Makefile and change the corresponding *_IMPL macro where * represents <LANGUAGE>_<TASK>. For example, to use the `Tweb' implementation of the POS tagger for generic language, make the following change to the zpar/Makefile:

GENERIC_TAGGER_IMPL = tweb

Then build the generic POS tagger as usual by typing make generic.postagger.

To examine which implementations are supported for a specific task, please consult the sub-folders under:

par/src/<LANGUAGE>/<TASK>/implementations/

For example, the supported Chinese word segmentors are,

bash$ ls zpar/src/chinese/segmentor/implementations/
  acl07  action  agenda  agendachart  agendaplus  viterbi

The name of a sub-folder is also the name of the implementation. The <LANGUAGE> can be:

The <TASK>can be:

When the implementation of a specific task is changed, the corresponding component of ZPar will be updated accordingly, when ZPar is compiled again.

How to contribute code by writing a new implementation for a task

You may add a new implementation for a specific task by creating a sub-folder as described in How to change the implementation for an existing task.

ZPar requires all the implementations compatible with ZPar APIs and source/object file naming conventions. For example, to write a new generic language POS tagger, one must provide the following class in the source file tagger.h in zpar/src/generic/tagger/implementations/<NEW_TAGGER>,

namespace TARGET_LANGUAGE {
  class CTagger {
    public:
    CTagger(const std::string &sFeatureDBPath, bool bTrain=false);
    void loadTagDictionary(const std::string &sTagDictPath);
    void loadKnowledge(const std::string &sKnowledge);
    bool train(const CTwoStringVector *correct);
    void finishTraining();
    void tag(CStringVector *sentence, CTwoStringVector *retval, int nBest=1, double *out_scores=NULL);
  }
}

A tagger.cpp file should contain the implementations of the methods, and be saved into the same directory. It should be compiled into generic.postagger.o. In addition, the new POS tagger implementation should provide two files, tagger_weight.h and tagger_weight.cpp, which should be compiled into weight.o. For other tasks and languages, you may look into an existing implementation and Makefile template for details.

If the custom implementation contains all the source and object files according to the specifications above, there is nothing further to do. Simply change the corresponding *_IMPL macro in zpar/Makefile into the name of the custom implementation and build as usual. The build system will automatically detect the implementation, compile it and link it into ZPar.

However, if it is not sufficient to accomplish the custom implementation in those files, and additional objects are necessary, you have to provide your own rules and actions to compile and link the additional objects. In such a situation, make a copy of the corresponding Makefile template for the task and language from zpar/Makefile.d/ folder to your custom implementation folder:

cp Makefile.d/Makefile.<LANGUAGE>.<TASK> \
  src/<LANGUAGE>/<TASK>/implementations/<IMPL_NAME>/Makefile

For example, to write a new generic language POS tagger,

cp Makefile.d/Makefile.ge.postagger \
  src/common/postagger/implementations/<NEW_TAGGER>/Makefile}

And change the rules in it to compile the custom implementation. Note that the object file naming must strictly follow the conventions described above. You should link all the required objects of your own into one with proper name. For example, to write a new generic language POS tagger, one should make sure that all the necessary object files are linked into generic.postagger.o. For details, zpar/Makefile.d/Makefile.ge.postagger.tweb can be taken as an example.