CleanPro

CleanPro is an HTML cleaner which removes mark-up tags and irrelevant text (i.e. words used in navigation menus, common headers and footers, etc.) from HTML pages.

Algorithm: Rule-based.

Evaluation benchmark: Cleaneval 2007 (English). [results]

Reference:

Emanuele Pianta, Christian Girardi, and Roberto Zanoli. The TextPro Tool Suite. In Proceedings of LREC, 6th edition of the Language Resources and Evaluation Conference, 28-30 May 2008, Marrakech (Morocco).