Language- and Treebank-Independent Parsers

Introduction

The Chinese and English parsers are specifically designed to process the two languages, and by default use the Penn Chinese Treebank and Penn Treebank labels. You can specify alternative label sets by modifying zpar/src/chinese/tags.h for POS tags, zpar/src/chinese/dep.h for dependency labels, and zpar/src/chinese/cfg.h for constituent labels. These are hard-coded; the English version are placed in zpar/src/english.

On the other hand, you can compile a generic version of ZPar, which takes any tags in the training data, and compile them into tag sets automatically. The speed of the generic tag sets are slower when compared with the hard-coded tag sets. The files are placed in zpar/src/generic.

To compile individual models with these tags, use generic in the place of chinese or english. For example, make generic.conparser. The implementations are found from src/common/GENERIC_CONPARSER_IMPL. The generic ZPar can be compiled by make zpar.ge.

The generic parsers are used by different languages and treebank formats, for example, the generic depparser can be used to process CoNLL data in 13 languages.

Language- and Treebank-Independent Parsers

Introduction

The generic ZPar

The generic POS-tagger

The generic dependency parser

The generic constituent parser