EntityPro annotates named entities, i.e. proper names of persons, locations and organizations, in a text. The module is based on a statistical classifier and makes use of local features, gazetteers, long-distance features and distributional features extracted from very large non-annotated corpora.
To allow for easy domain adaptation, EntityPro implements white and black lists, through which you can force a specific behavior of the classifier on certain entities.
The module is available with pre-trained models in the news domain for three languages, and has been integrated into the TextPro Active Learning platform.
Algorithm: EntityPro uses Yamcha for feature extraction and SVM as a classification algorithm.
Resources: I-CAB (Italian), CoNLL 2003 (English) and the EUCLIP dataset for German.
Evaluation benchmark: NER at Evalita 2007 (Italian).
Emanuele Pianta and Roberto Zanoli. EntityPro: Exploiting SVM for Italian Named Entity Recognition. Intelligenza Artificiale – numero speciale su Strumenti per l’elaborazione del linguaggio naturale per l’italiano EVALITA 2007, vol. 4, no. 2, pp. 69-70, Associazione Italiana per l’Intelligenza Artificiale, 2007.