Package com.optimaize.langdetect
Class LanguageDetectorImpl
- java.lang.Object
-
- com.optimaize.langdetect.LanguageDetectorImpl
-
- All Implemented Interfaces:
LanguageDetector
public final class LanguageDetectorImpl extends Object implements LanguageDetector
This class is immutable and thus thread-safe.
- Author:
- Nakatani Shuyo, Fabian Kessler, Elmer Garduno
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description com.google.common.base.Optional<LdLocale>
detect(CharSequence text)
Returns the best detected language if the algorithm is very confident.List<DetectedLanguage>
getProbabilities(CharSequence text)
Returns all languages with at least some likeliness.
-
-
-
Method Detail
-
detect
public com.google.common.base.Optional<LdLocale> detect(CharSequence text)
Description copied from interface:LanguageDetector
Returns the best detected language if the algorithm is very confident.Note: you may want to use getProbabilities() instead. This here is very strict, and sometimes returns absent even though the first choice in getProbabilities() is correct.
- Specified by:
detect
in interfaceLanguageDetector
- Parameters:
text
- You probably want aTextObject
.- Returns:
- The language if confident, absent if unknown or not confident enough.
-
getProbabilities
public List<DetectedLanguage> getProbabilities(CharSequence text)
Description copied from interface:LanguageDetector
Returns all languages with at least some likeliness.There is a configurable cutoff applied for languages with very low probability.
The way the algorithm currently works, it can be that, for example, this method returns a 0.99 for Danish and less than 0.01 for Norwegian, and still they have almost the same chance. It would be nice if this could be improved in future versions.
- Specified by:
getProbabilities
in interfaceLanguageDetector
- Parameters:
text
- You probably want aTextObject
.- Returns:
- Sorted from better to worse. May be empty. It's empty if the program failed to detect any language, or if the input text did not contain any usable text (just noise).
-
-