Using ZPar Off-the-Shelf
Overview
ZPar for Chinese (Penn Chinese Treebank), English (Penn Treebank) and ZPar generic (language-independent)
are compiled separately into three independent programs:
zpar.zh
, zpar.en
and zpar
, respectively [1].
Each program needs to be executed
with a set of corresponding statistical models.
Some example sets of models are released together with ZPar source so that the public release can be used off-the-shelf.
The current version of ZPar is 0.7.
Its release contains a set of models for zpar.en
,
and a set of models for zpar.zh
,
which support labeled dependency parsing and context-free-grammar parsing.
Download and installation
The source code and models can be downloaded from github. Unzip the source zip file into the source directory and the corresponding model files each into a model directory.
To download the models:-
Click "Source code" to download source code of zpar, and click "chinese-models.zip" and "english-models.zip" to download related models.
To compile ZPar,
type make
in the zpar source directory. The binary file zpar
will placed
in the dist
folder. Type make zpar
, make zpar.en
and
make zpar.zh
to make ZPar for ZPar generic, English and Chinese, respectively.
Usage of the generic ZPar
ZPar is a statistical language analyzer, which works by using a statistical model. As a result, a ZPar binary executable file must run with a model file.
Suppose that the source files are saved in the folder zpar
and the models are saved in models
. To run zpar, type zpar/dist/zpar models
, and wait for the models to be loaded. After all models are loaded, type in sentences, and the parses will be printed out on the screen. Alternatively, type zpar/dist/zpar models.en input output
to read sentences from the input
and write the corresponding parses to output
.
Run zpar
without command-line arguments to show options. In particular, the -o
option controls the type of output. Use -ot
to produce POS-tagged sentences,
and
-od
to produce dependency structures.
The default option is
-oc
which produces constituent structures (brackets).
In the following example, inputs were shown in red and outputs were shown in blue.
bash$ zpar/dist/zpar -oc models
Parsing started
[tagger] Loading scores ... done.
[parser] Loading scores... done.
ZPar is a parser .
(S (NP (NNP ZPar)) (VP (VBZ is) (NP (DT a) (NN parser))) (. .))
Given a natural language sentence, ZPar produces its syntactic structure .
(S (VP (VBN Given) (NP (NP (DT a) (JJ natural) (NN language)) (SBAR (S (NP (VBN sentence,) (NNP ZPar)) (VP (VBZ produces) (NP (PRP$ its) (NN syntactic) (NN structure))))))) (. .))
ZPar works by training a model from annotated data , and making analysis using the model .
(S (NP (NNP ZPar)) (VP (VBZ works) (PP (IN by) (S (VP (VP (VBG training) (NP (DT a) (NN model)) (PP (IN from) (NP (VBN annotated) (NNS data)))) (, ,) (CC and) (VP (VBG making) (NP (NP (NN analysis)) (VP (VBG using) (NP (DT the) (NN model))))))))) (. .))
^D
Parsing has finished successfully.
Usage of ZPar for Chinese
Suppose that the source files are saved in the folder zpar.zh
and the models are saved in models.zh
. To run zpar, type zpar/dist/zpar.zh models.zh
, and wait for the models to be loaded. After all models are loaded, type in Chinese sentences, and the parses will be printed out on the screen. Alternatively, type zpar/dist/zpar.zh models.zh input output
to read Chinese sentences from the input
and write the corresponding parses to output
.
In the following example, inputs are shown in red and outputs are shown in blue.
bash$ zpar/dist/zpar.zh models.zh
Parsing started
Loading scores ... done.
Loading scores... done.
这是一个例子。
(IP (NP (PN 这)) (VP (VC 是) (NP (QP (CD 一) (CLP (M 个))) (NP (NN 例子)))) (PU
。))
输入一个句子,程序会给出它的句法分析。
(IP (IP (VP (VV 输入) (NP (QP (CD 一) (CLP (M 个))) (NP (NN 句子))))) (PU ,) (NP (NN 程序)) (VP (VV 会) (VP (VV 给出) (NP (DNP (NP (PN 它)) (DEG 的)) (ADJP (JJ 句法)) (NP (NN 分析))))) (PU 。))
ZPar通过机器学习获得知识;虽然大多情况正确,但是也会有分析失误。
(IP (IP (NP (NN ZPar)) (VP (PP (P 通过) (NP (NN 机器))) (VP (VV 学习) (IP (VP (VV 获得) (NP (NN 知识))))))) (PU ;) (CP (ADVP (CS 虽然)) (IP (ADVP (AD 大多)) (NP (NN 情况)) (VP (VA 正确)))) (PU ,) (VP (ADVP (AD 但是)) (ADVP (AD 也)) (VP (VV 会) (VP (VE 有) (IP (NP (NN 分析)) (VP (VV 失误)))))) (PU 。))
^D
Parsing has finished successfully.
Usage of ZPar for English
Suppose that the source files are saved in the folder zpar
and the models are saved in models.en
. To run zpar, type zpar/dist/zpar.en models.en
, and wait for the models to be loaded. After all models are loaded, type in English sentences, and the parses will be printed out on the screen. Alternatively, type zpar/dist/zpar.en models.en input output
to read English sentences from the input
and write the corresponding parses to output
.
Run zpar.en
without command-line arguments to show options. In particular, the -o
option controls the type of output. Use -ot
to produce POS-tagged sentences,
and
-od
to produce dependency structures.
The default option is
-oc
which produces constituent structures (brackets).
In the following example, inputs were shown in red and outputs were shown in blue.
bash$ zpar/dist/zpar.en -oc models.en
Parsing started
[tagger] Loading scores ... done.
[parser] Loading scores... done.
ZPar is a parser .
(S (NP (NNP ZPar)) (VP (VBZ is) (NP (DT a) (NN parser))) (. .))
Given a natural language sentence, ZPar produces its syntactic structure .
(S (VP (VBN Given) (NP (NP (DT a) (JJ natural) (NN language)) (SBAR (S (NP (VBN sentence,) (NNP ZPar)) (VP (VBZ produces) (NP (PRP$ its) (NN syntactic) (NN structure))))))) (. .))
ZPar works by training a model from annotated data , and making analysis using the model .
(S (NP (NNP ZPar)) (VP (VBZ works) (PP (IN by) (S (VP (VP (VBG training) (NP (DT a) (NN model)) (PP (IN from) (NP (VBN annotated) (NNS data)))) (, ,) (CC and) (VP (VBG making) (NP (NP (NN analysis)) (VP (VBG using) (NP (DT the) (NN model))))))))) (. .))
^D
Parsing has finished successfully.
Usage of submodels
ZPar consists of various implementations of a word segmentor, a POS-tagger, a
joint segmentation and tagging system, a dependency parser and a constituency
parser. To compile and use each submodel, run make [submodel]
, where
[submodel]
can be segmentor
, [language].postagger
,
[language].depparser
or [language].conparser
. [language]
can be chinese
, english
or generic
.
For example, if you want to compile the
chinese dependency parser, type make chinese.depparser
. To change the
implementation method of
particular submodels, modify the corresponding configurations from Makefile.
For example, the macro SEGMENTOR_IMPL
in Makefile defines the implementation of
the segmentor. The corresponding code can be found at
src/chinese/segmentor/SEGMENTOR_IMPL/
.
Notes on versions before 0.7
[1] For ZPar versions before 0.7, the default targetzpar
is Chinese and the generic ZPar is zpar.ge
.
Scripts
A pretty print script for the output of the constituent parser. Usage is prettyprint.sh conparser_output. Thanks to Silas S. Brown for providing the script.
Reference
Yue Zhang and Stephen Clark. 2011. Syntactic Processing Using the Generalized Perceptron and Beam Search. In Computational Linguistics, 37(1), March.