Introduction

The Chinese and English parsers are specifically designed to process the two languages, and by default use the Penn Chinese Treebank and Penn Treebank labels. You can specify alternative label sets by modifying zpar/src/chinese/tags.h for POS tags, zpar/src/chinese/dep.h for dependency labels, and zpar/src/chinese/cfg.h for constituent labels. These are hard-coded; the English version are placed in zpar/src/english.

On the other hand, you can compile a generic version of ZPar, which takes any tags in the training data, and compile them into tag sets automatically. The speed of the generic tag sets are slower when compared with the hard-coded tag sets. The files are placed in zpar/src/generic.

To compile individual models with these tags, use generic in the place of chinese or english. For example, make generic.conparser. The implementations are found from src/common/GENERIC_CONPARSER_IMPL. The generic ZPar can be compiled by make zpar.ge.

The generic parsers are used by different languages and treebank formats, for example, the generic depparser can be used to process CoNLL data in 13 languages.

The generic ZPar

Since ZPar 0.7, the generic ZPar system is the default ZPar. Type make zpar to compile. Usage of the generic system can be found in the Quick Start Manual.

The generic POS-tagger

The structure of the generic POS-tagger is comsistent with the code of the english POS-tagger, with the differences been mentioned in Introduction.

The generic dependency parser

The structure of the generic dependency parser is comsistent with the code of the english dependency parser, with the differences been mentioned in Introduction.

The generic constituent parser

The structure of the generic constituent parser is comsistent with the code of the english constituent parser, with the differences been mentioned in Introduction.