Language- and Treebank-Independent Parsers
Introduction
The Chinese and English parsers are specifically designed to process the two languages, and by default use the Penn Chinese Treebank and Penn Treebank labels. You can specify alternative label sets by modifying zpar/src/chinese/tags.h
for POS tags, zpar/src/chinese/dep.h
for dependency labels, and zpar/src/chinese/cfg.h
for constituent labels. These are hard-coded; the English version are placed in zpar/src/english
.
On the other hand, you can compile a generic
version of ZPar, which takes any tags in the training data, and compile them into tag sets automatically. The speed of the generic tag sets are slower when compared with the hard-coded tag sets. The files are placed in zpar/src/generic
.
To compile individual models with these tags, use generic
in the place of chinese
or english
. For example, make generic.conparser
. The implementations are found from src/common/GENERIC_CONPARSER_IMPL
. The generic ZPar can be compiled by make zpar.ge
.
The generic parsers are used by different languages and treebank formats, for example, the generic depparser can be used to process CoNLL data in 13 languages.
The generic ZPar
Since ZPar 0.7, the generic ZPar system is the default ZPar.
Type make zpar
to compile.
Usage of the generic system can be found in the Quick Start Manual.
The generic POS-tagger
The structure of the generic POS-tagger is comsistent with the code of the english POS-tagger, with the differences been mentioned in Introduction.
The generic dependency parser
The structure of the generic dependency parser is comsistent with the code of the english dependency parser, with the differences been mentioned in Introduction.
The generic constituent parser
The structure of the generic constituent parser is comsistent with the code of the english constituent parser, with the differences been mentioned in Introduction.