Modules

The current TextPro pipeline includes the following modules:

  Italian English German
CleanPro removes mark-up tags from HTML pages
XXX
TokenPro, a tokenizer, splits up a text into “words”XX
X
SentencePro, a sentence splitter, marks the end of a sentenceXXX
MorphoProa morphological analyzer and synthesizerXX-
TagPro, a Part of Speech (PoS) tagger, marks a word with a PoSXX-
LemmaPro, a lemmatizer, marks a word with its lemmaXX-
ChunkPro, a chunker, groups words for shallow syntactic analysis XX-
EntityPro marks named entities, e.g. persons and organizations
XXX
TimeProa time expression recognizerXX-
GeoCoder assigns geographical coordinates to toponymsXX-
SyntaxProa syntactic analyser based on dependency relations X-
-
EventPro recognizes and classifies events
X
X
-
FactPro assigns a factuality value to each event
X-
-
TempRelPro identifies temporal relations among events
X
X
-
KeyPro extracts relevant concepts mentioned in a text
XX-