The current TextPro pipeline includes the following modules:
| Italian | English | German | CleanPro removes mark-up tags from HTML pages
| X | X | X
| TokenPro, a tokenizer, splits up a text into “words” | X | X
| X | SentencePro, a sentence splitter, marks the end of a sentence | X | X | X | MorphoPro, a morphological analyzer and synthesizer | X | X | - | TagPro, a Part of Speech (PoS) tagger, marks a word with a PoS | X | X | - | LemmaPro, a lemmatizer, marks a word with its lemma | X | X | - | ChunkPro, a chunker, groups words for shallow syntactic analysis | X | X | - | EntityPro marks named entities, e.g. persons and organizations
| X | X | X
| TimePro, a time expression recognizer | X | X | -
| GeoCoder assigns geographical coordinates to toponyms | X | X | -
| SyntaxPro, a syntactic analyser based on dependency relations | X | -
| -
| EventPro recognizes and classifies events
| X
| X
| - | FactPro assigns a factuality value to each event
| X | -
| - | TempRelPro identifies temporal relations among events
| X
| X
| - | KeyPro extracts relevant concepts mentioned in a text
| X | X | - |
|
|