ChunkPro groups words into flat (i.e. unnested) syntactic constituents (e.g. noun phrases, verb phrases, etc.), providing both a shallow syntactic analysis of a text and an intermediate step toward full parsing.

For instance, the two tokens “his term” are grouped in an NP constituent.

The module is available with pre-trained models in the news domain for two languages, English and Italian (currently, NP and VP only).


Algorithm: ChunkPro uses Yamcha for feature extraction and SVM as a classification algorithm.

Resources: CoNLL-2000 dataset (English).

Evaluation benchmark: CoNLL-2000 Shared Task for English.