CleanPro
CleanPro is an HTML cleaner which removes mark-up tags and irrelevant text (i.e. words used in navigation menus, common headers and footers, etc.) from HTML pages.
Algorithm: Rule-based.
Evaluation benchmark: Cleaneval 2007 (English). [results]
Reference:
Emanuele Pianta, Christian Girardi, and Roberto Zanoli. The TextPro Tool Suite. In Proceedings of LREC, 6th edition of the Language Resources and Evaluation Conference, 28-30 May 2008, Marrakech (Morocco).