Package com.optimaize.langdetect
Class LanguageDetectorBuilder
- java.lang.Object
-
- com.optimaize.langdetect.LanguageDetectorBuilder
-
public class LanguageDetectorBuilder extends Object
Builder forLanguageDetector
.This class does no internal synchronization.
- Author:
- Fabian Kessler
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description LanguageDetectorBuilder
affixFactor(double affixFactor)
Sets prefixFactor() and suffixFactor() both to the given value.LanguageDetectorBuilder
alpha(double alpha)
LanguageDetector
build()
static LanguageDetectorBuilder
create(@NotNull NgramExtractor ngramExtractor)
LanguageDetectorBuilder
languagePriorities(@Nullable Map<LdLocale,Double> langWeightingMap)
TODO document exactly.LanguageDetectorBuilder
minimalConfidence(double minimalConfidence)
LanguageDetector.detect(java.lang.CharSequence)
returns a language if the best detected language has at least this probability.LanguageDetectorBuilder
prefixFactor(double prefixFactor)
To weight n-grams that are on the left border of a word differently from n-grams in the middle of words, assign a value here.LanguageDetectorBuilder
probabilityThreshold(double probabilityThreshold)
LanguageDetector.getProbabilities(java.lang.CharSequence)
does not return languages with less probability than this.LanguageDetectorBuilder
seed(long seed)
LanguageDetectorBuilder
seed(@NotNull com.google.common.base.Optional<Long> seed)
LanguageDetectorBuilder
shortTextAlgorithm(int shortTextAlgorithm)
Defaults to 0, which means don't use this feature.LanguageDetectorBuilder
suffixFactor(double suffixFactor)
Defaults to 1.0, which means don't use this feature.LanguageDetectorBuilder
withProfile(LanguageProfile languageProfile)
LanguageDetectorBuilder
withProfiles(Iterable<LanguageProfile> languageProfiles)
-
-
-
Method Detail
-
create
public static LanguageDetectorBuilder create(@NotNull @NotNull NgramExtractor ngramExtractor)
-
alpha
public LanguageDetectorBuilder alpha(double alpha)
-
seed
public LanguageDetectorBuilder seed(long seed)
-
seed
public LanguageDetectorBuilder seed(@NotNull @NotNull com.google.common.base.Optional<Long> seed)
-
shortTextAlgorithm
public LanguageDetectorBuilder shortTextAlgorithm(int shortTextAlgorithm)
Defaults to 0, which means don't use this feature. That's the old behavior.
-
affixFactor
public LanguageDetectorBuilder affixFactor(double affixFactor)
Sets prefixFactor() and suffixFactor() both to the given value.- See Also:
prefixFactor(double)
-
prefixFactor
public LanguageDetectorBuilder prefixFactor(double prefixFactor)
To weight n-grams that are on the left border of a word differently from n-grams in the middle of words, assign a value here. Affixes (prefixes and suffixes) often distinguish the specific features of languages. Giving a value greater than 1.0 weights these n-grams higher. A 2.0 weights them double. Defaults to 1.0, which means don't use this feature.- Parameters:
prefixFactor
- 0.0 to 10.0, a suggested value is 1.5
-
suffixFactor
public LanguageDetectorBuilder suffixFactor(double suffixFactor)
Defaults to 1.0, which means don't use this feature.- Parameters:
suffixFactor
- 0.0 to 10.0, a suggested value is 2.0- See Also:
prefixFactor(double)
-
probabilityThreshold
public LanguageDetectorBuilder probabilityThreshold(double probabilityThreshold)
LanguageDetector.getProbabilities(java.lang.CharSequence)
does not return languages with less probability than this. The default currently is 0.1 (the old hardcoded value), but don't rely on it, if you need to be sure then set one.
-
minimalConfidence
public LanguageDetectorBuilder minimalConfidence(double minimalConfidence)
LanguageDetector.detect(java.lang.CharSequence)
returns a language if the best detected language has at least this probability. The default currently is 0.9999d, but don't rely on it, if you need to be sure then set one.
-
languagePriorities
public LanguageDetectorBuilder languagePriorities(@Nullable @Nullable Map<LdLocale,Double> langWeightingMap)
TODO document exactly. Also explain how it influences the results. Maybe check for unsupported languages at some point, or not, but document whether it does throw or ignore. String key = language, Double value = priority (probably 0-1).
-
withProfile
public LanguageDetectorBuilder withProfile(LanguageProfile languageProfile) throws IllegalStateException
- Throws:
IllegalStateException
- if a profile for the same language was added already (must be a userland bug).
-
withProfiles
public LanguageDetectorBuilder withProfiles(Iterable<LanguageProfile> languageProfiles) throws IllegalStateException
- Throws:
IllegalStateException
- if a profile for the same language was added already (must be a userland bug).
-
build
public LanguageDetector build() throws IllegalStateException
- Throws:
IllegalStateException
- if no LanguageProfile wasadded
.
-
-