Package org.languagetool.tokenizers
-
Interface Summary Interface Description CompoundWordTokenizer Interface for components that take compound words and split them into their parts.SentenceTokenizer Tokenizes text into sentences.Tokenizer Interface for classes that tokenize text into smaller units. -
Class Summary Class Description SimpleSentenceTokenizer A very simple sentence tokenizer that splits on[.!?…]
followed by whitespace or an uppercase letter.SRXSentenceTokenizer Class to tokenize sentences using rules from an SRX file.WordTokenizer Tokenizes a sentence into words.