CleanPro is an HTML cleaner which removes mark-up tags and irrelevant text (i.e.
words used in navigation menus, common headers and footers, etc.) from HTML
pages.
Algorithm: Rule-based. Reference: Emanuele Pianta, Christian Girardi, and Roberto Zanoli. The TextPro Tool Suite. In Proceedings of LREC, 6th edition of the Language Resources and Evaluation Conference, 28-30 May 2008, Marrakech (Morocco). |
Modules >