Modules‎ > ‎

CleanPro-old

Description: [fenomeno e esempio]
CleanPro is a HTML cleaner which  removes mark-up tags and irrelevant text (i.e. words used as navigation menu, common header and footer, etc.) from HTML pages. Using the option -html the input HTML file will be cleaned. The relevant text is kept as input text of the following modules.

Algorithm and resources: [training data and test data]
Rule-based

Languages:
English and Italian

Evaluation:
98% accuracy (internal evaluation)

References: [published papers, reports, etc. describing the module]

Contact:
Christian Girardi – cgirardi@fbk.eu

Input format

Raw text

Output Information: [cosa aggiunge]
Raw text clean of HTML tags

Dependencies:
None

How To Use:
Pass “-html” as a parameter while calling Textpro, to activate the module

NB: se il campo e' vuoto non deve apparire