Package com.optimaize.langdetect.ngram
Class NgramExtractor
- java.lang.Object
-
- com.optimaize.langdetect.ngram.NgramExtractor
-
public class NgramExtractor extends Object
Class for extracting n-grams out of a text.- Author:
- Fabian Kessler
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description @NotNull Map<String,Integer>
extractCountedGrams(@NotNull CharSequence text)
@NotNull List<String>
extractGrams(@NotNull CharSequence text)
Creates the n-grams for a given text in the order they occur.NgramExtractor
filter(NgramFilter filter)
List<Integer>
getGramLengths()
static NgramExtractor
gramLength(int gramLength)
static NgramExtractor
gramLengths(Integer... gramLength)
NgramExtractor
textPadding(char textPadding)
To ensure having border grams, this character is added to the left and right of the text.
-
-
-
Method Detail
-
gramLength
public static NgramExtractor gramLength(int gramLength)
-
gramLengths
public static NgramExtractor gramLengths(Integer... gramLength)
-
filter
public NgramExtractor filter(NgramFilter filter)
-
textPadding
public NgramExtractor textPadding(char textPadding)
To ensure having border grams, this character is added to the left and right of the text.Example: when textPadding is a space ' ' then a text input "foo" becomes " foo ", ensuring that n-grams like " f" are created.
If the text already has such a character in that position (eg starts with), it is not added there.
- Parameters:
textPadding
- for example a space ' '.
-
extractGrams
@NotNull public @NotNull List<String> extractGrams(@NotNull @NotNull CharSequence text)
Creates the n-grams for a given text in the order they occur.Example: extractSortedGrams("Foo bar", 2) => [Fo,oo,o , b,ba,ar]
- Parameters:
text
-- Returns:
- The grams, empty if the input was empty or if none for that gramLength fits.
-
extractCountedGrams
@NotNull public @NotNull Map<String,Integer> extractCountedGrams(@NotNull @NotNull CharSequence text)
- Returns:
- Key = ngram, value = count The order is as the n-grams appeared first in the string.
-
-