User Manual of ZPar
Overview
ZPar is a statistical natural language parser, which performs syntactic analysis tasks including word segmentation, part-of-speech tagging and parsing. ZPar supports multiple languages and multiple grammar formalisms. ZPar has been most heavily developed for Chinese (on the Penn Chinese Treebank and Peking University Multiview Treebank) and English (on the Penn Treebank), while it provides generic support for other languages and treebanks. A Romanian model has been trained for ZPar 0.2, for example. ZPar currently supports context free grammars (CFG), dependency grammars and combinatory categorial grammars (CCG).
System Requirements
The ZPar software requires the following basic system configuration
- Windows, Linux or Mac
- GCC (for Linux and Mac) or MinGW (for Windows)
- 256MB of RAM minimum (or larger depending on the modules used)
- At least 500MB of hard disk space (or larger depending on the modules used)
Download and Installation
Binaries and sourrces of the latest release can be downloaded from
github.
ZPar provides functionalities for different languages and treebanks, such as
zpar
,
zpar.en
,
zpar.zh
, and
zpar.mvt
for generic language,
English Penn Treebank,
Chinese Penn Treebank,
and Chinese multiview treebank,
respectively.
Source codes and binaries are provided for Windows, Linux and Mac.
Standalone sub-modules can be built for individual tasks,
such as
segmentor
,
postagger
,
conparser
, and
depparser
for word segmentation,
POS-tagging,
phrase-structure parsing,
and dependency parsing.
Quick Start
ZPar can be used off the shelf by referring to the quick start; Sub-modules such as the word segmentor, POS-tagger and parsers can also be used by following the detailed instructions for the compilation, training, and usage of individual modules.
List of Manuals
- Quick Start
- Introduction to the ZPar source structure
- Introduction to the ZPar build system
- ZPar for the multiview Chinese Treebank
- Chinese word segmentation
- Chinese joint segmentation and POS tagging
- English POS tagging
- ZPar support for the TWeb tagger
- Chinese and English dependency parsing
- Chinese and English phrase-structure parsing
- Language- and Treebank-independent parsers
- CCG parsing
License
The software source is under GPL (v.3), and a separate commercial license issued by Oxford University for non-opensource. Various models available for download were trained from different text resources, which may require further licenses.
Contributers to the Documentation
Reference
- Yue Zhang and Stephen Clark. 2011. Syntactic Processing Using the Generalized Perceptron and Beam Search. Computational Linguistics, 37(1):105-151.