Description: [fenomeno e esempio]
CleanPro is a HTML cleaner which removes mark-up tags and irrelevant text (i.e. words used as navigation menu, common header and footer, etc.) from HTML pages. Using the option -html the input HTML file will be cleaned. The relevant text is kept as input text of the following modules.
Algorithm and resources: [training data and test data]
Rule-based
Languages:
English and Italian
Evaluation:
98% accuracy (internal evaluation)
References: [published papers, reports, etc. describing the module]
Contact:
Christian Girardi – cgirardi@fbk.eu
Input format:
Raw text
Output Information: [cosa aggiunge]
Raw text clean of HTML tags
Dependencies:
None
How To Use:
Pass “-html” as a parameter while calling Textpro, to activate the module
NB: se il campo e' vuoto non deve apparire