SentencePro

SentencePro is a rule-based sentence splitter, which identifies sentence boundaries in a text.

The module marks the end of a sentence when a specific punctuation mark (for instance a full stop, a question mark, an exclamation mark, etc.) is found, unless it is part of a linguistic expression, like an abbreviation or an acronym (such as "U.S.", which stands for "United States"). SentencePro can be fully customized from an XML configuration file in order to define specific sentence-ending rules.

Example

Algorithm: Rules defined as regular expressions.

Resources: Lists of abbreviations for Italian, English and German.