Overview

ZPar for Chinese (Penn Chinese Treebank), English (Penn Treebank) and ZPar generic (language-independent) are compiled separately into three independent programs: zpar.zh, zpar.en and zpar, respectively [1]. Each program needs to be executed with a set of corresponding statistical models. Some example sets of models are released together with ZPar source so that the public release can be used off-the-shelf.

The current version of ZPar is 0.7. Its release contains a set of models for zpar.en, and a set of models for zpar.zh, which support labeled dependency parsing and context-free-grammar parsing.

Download and installation

The source code and models can be downloaded from github. Unzip the source zip file into the source directory and the corresponding model files each into a model directory.

To download the models:

To compile ZPar, type make in the zpar source directory. The binary file zpar will placed in the dist folder. Type make zpar, make zpar.en and make zpar.zh to make ZPar for ZPar generic, English and Chinese, respectively.

Usage of the generic ZPar

ZPar is a statistical language analyzer, which works by using a statistical model. As a result, a ZPar binary executable file must run with a model file.

Suppose that the source files are saved in the folder zpar and the models are saved in models. To run zpar, type zpar/dist/zpar models, and wait for the models to be loaded. After all models are loaded, type in sentences, and the parses will be printed out on the screen. Alternatively, type zpar/dist/zpar models.en input output to read sentences from the input and write the corresponding parses to output.

Run zpar without command-line arguments to show options. In particular, the -o option controls the type of output. Use -ot to produce POS-tagged sentences, and -od to produce dependency structures. The default option is -oc which produces constituent structures (brackets).

In the following example, inputs were shown in red and outputs were shown in blue.


bash$ zpar/dist/zpar -oc models
Parsing started
[tagger] Loading scores ... done.
[parser] Loading scores... done.
ZPar is a parser . 
(S (NP (NNP ZPar)) (VP (VBZ is) (NP (DT a) (NN parser))) (. .)) 
Given a natural language sentence, ZPar produces its syntactic structure .  
(S (VP (VBN Given) (NP (NP (DT a) (JJ natural) (NN language)) (SBAR (S (NP (VBN sentence,) (NNP ZPar)) (VP (VBZ produces) (NP (PRP$ its) (NN syntactic) (NN structure))))))) (. .)) 
ZPar works by training a model from annotated data , and making analysis using the model .  
(S (NP (NNP ZPar)) (VP (VBZ works) (PP (IN by) (S (VP (VP (VBG training) (NP (DT a) (NN model)) (PP (IN from) (NP (VBN annotated) (NNS data)))) (, ,) (CC and) (VP (VBG making) (NP (NP (NN analysis)) (VP (VBG using) (NP (DT the) (NN model))))))))) (. .)) 
^D
Parsing has finished successfully. 

Usage of ZPar for Chinese

Suppose that the source files are saved in the folder zpar.zh and the models are saved in models.zh. To run zpar, type zpar/dist/zpar.zh models.zh, and wait for the models to be loaded. After all models are loaded, type in Chinese sentences, and the parses will be printed out on the screen. Alternatively, type zpar/dist/zpar.zh models.zh input output to read Chinese sentences from the input and write the corresponding parses to output.

In the following example, inputs are shown in red and outputs are shown in blue.


bash$ zpar/dist/zpar.zh models.zh
Parsing started
Loading scores ... done.
Loading scores... done.
这是一个例子。
(IP (NP (PN 这)) (VP (VC 是) (NP (QP (CD 一) (CLP (M 个))) (NP (NN 例子)))) (PU
。))
输入一个句子,程序会给出它的句法分析。
(IP (IP (VP (VV 输入) (NP (QP (CD 一) (CLP (M 个))) (NP (NN 句子))))) (PU ,) (NP (NN 程序)) (VP (VV 会) (VP (VV 给出) (NP (DNP (NP (PN 它)) (DEG 的)) (ADJP (JJ 句法)) (NP (NN 分析))))) (PU 。))
ZPar通过机器学习获得知识;虽然大多情况正确,但是也会有分析失误。
(IP (IP (NP (NN ZPar)) (VP (PP (P 通过) (NP (NN 机器))) (VP (VV 学习) (IP (VP (VV 获得) (NP (NN 知识))))))) (PU ;) (CP (ADVP (CS 虽然)) (IP (ADVP (AD 大多)) (NP (NN 情况)) (VP (VA 正确)))) (PU ,) (VP (ADVP (AD 但是)) (ADVP (AD 也)) (VP (VV 会) (VP (VE 有) (IP (NP (NN 分析)) (VP (VV 失误)))))) (PU 。))
^D
Parsing has finished successfully. 

Usage of ZPar for English

Suppose that the source files are saved in the folder zpar and the models are saved in models.en. To run zpar, type zpar/dist/zpar.en models.en, and wait for the models to be loaded. After all models are loaded, type in English sentences, and the parses will be printed out on the screen. Alternatively, type zpar/dist/zpar.en models.en input output to read English sentences from the input and write the corresponding parses to output.

Run zpar.en without command-line arguments to show options. In particular, the -o option controls the type of output. Use -ot to produce POS-tagged sentences, and -od to produce dependency structures. The default option is -oc which produces constituent structures (brackets).

In the following example, inputs were shown in red and outputs were shown in blue.


bash$ zpar/dist/zpar.en -oc models.en
Parsing started
[tagger] Loading scores ... done.
[parser] Loading scores... done.
ZPar is a parser . 
(S (NP (NNP ZPar)) (VP (VBZ is) (NP (DT a) (NN parser))) (. .)) 
Given a natural language sentence, ZPar produces its syntactic structure .  
(S (VP (VBN Given) (NP (NP (DT a) (JJ natural) (NN language)) (SBAR (S (NP (VBN sentence,) (NNP ZPar)) (VP (VBZ produces) (NP (PRP$ its) (NN syntactic) (NN structure))))))) (. .)) 
ZPar works by training a model from annotated data , and making analysis using the model .  
(S (NP (NNP ZPar)) (VP (VBZ works) (PP (IN by) (S (VP (VP (VBG training) (NP (DT a) (NN model)) (PP (IN from) (NP (VBN annotated) (NNS data)))) (, ,) (CC and) (VP (VBG making) (NP (NP (NN analysis)) (VP (VBG using) (NP (DT the) (NN model))))))))) (. .)) 
^D
Parsing has finished successfully. 

Usage of submodels

ZPar consists of various implementations of a word segmentor, a POS-tagger, a joint segmentation and tagging system, a dependency parser and a constituency parser. To compile and use each submodel, run make [submodel], where [submodel] can be segmentor, [language].postagger, [language].depparser or [language].conparser. [language] can be chinese, english or generic. For example, if you want to compile the chinese dependency parser, type make chinese.depparser. To change the implementation method of particular submodels, modify the corresponding configurations from Makefile. For example, the macro SEGMENTOR_IMPL in Makefile defines the implementation of the segmentor. The corresponding code can be found at src/chinese/segmentor/SEGMENTOR_IMPL/.

Notes on versions before 0.7

[1] For ZPar versions before 0.7, the default target zpar is Chinese and the generic ZPar is zpar.ge.

Scripts

A pretty print script for the output of the constituent parser. Usage is prettyprint.sh conparser_output. Thanks to Silas S. Brown for providing the script.

Reference

Yue Zhang and Stephen Clark. 2011. Syntactic Processing Using the Generalized Perceptron and Beam Search. In Computational Linguistics, 37(1), March.