Interface LanguageDetector

  • All Known Implementing Classes:
    LanguageDetectorImpl

    public interface LanguageDetector
    Guesses the language of an input string or text.

    See website for details.

    This detector cannot handle well: Short input text, can work or give wrong results. Text written in multiple languages. It likely returns the language for the most prominent text. It's not made for that. Text written in languages for which the detector has no profile loaded. It may just return other similar languages.

    Author:
    Fabian Kessler
    • Method Detail

      • detect

        com.google.common.base.Optional<LdLocale> detect​(CharSequence text)
        Returns the best detected language if the algorithm is very confident.

        Note: you may want to use getProbabilities() instead. This here is very strict, and sometimes returns absent even though the first choice in getProbabilities() is correct.

        Parameters:
        text - You probably want a TextObject.
        Returns:
        The language if confident, absent if unknown or not confident enough.
      • getProbabilities

        List<DetectedLanguage> getProbabilities​(CharSequence text)
        Returns all languages with at least some likeliness.

        There is a configurable cutoff applied for languages with very low probability.

        The way the algorithm currently works, it can be that, for example, this method returns a 0.99 for Danish and less than 0.01 for Norwegian, and still they have almost the same chance. It would be nice if this could be improved in future versions.

        Parameters:
        text - You probably want a TextObject.
        Returns:
        Sorted from better to worse. May be empty. It's empty if the program failed to detect any language, or if the input text did not contain any usable text (just noise).