Package org.languagetool.language
Class LanguageIdentifier
- java.lang.Object
-
- org.languagetool.language.LanguageIdentifier
-
public class LanguageIdentifier extends Object
Identify the language of a text. Note that some languages might never be detected because they are close to another language. Language variants like en-US or en-GB are not detected, the result will been
for those. By default, only the first 1000 characters of a text are considered. Email signatures that use\n-- \n
as a delimiter are ignored.- Since:
- 2.9
-
-
Constructor Summary
Constructors Constructor Description LanguageIdentifier()
LanguageIdentifier(int maxLength)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description @Nullable Language
detectLanguage(String text)
@Nullable DetectedLanguage
detectLanguage(String text, List<String> noopLangsTmp, List<String> preferredLangsTmp)
void
enableFasttext(File fasttextBinary, File fasttextModel)
-
-
-
Constructor Detail
-
LanguageIdentifier
public LanguageIdentifier()
-
LanguageIdentifier
public LanguageIdentifier(int maxLength)
- Parameters:
maxLength
- the maximum number of characters that will be considered - can help with performance. Don't use values below 100, as this would decrease accuracy.- Throws:
IllegalArgumentException
- ifmaxLength
is less than 10- Since:
- 4.2
-
-
Method Detail
-
detectLanguage
@Nullable public @Nullable Language detectLanguage(String text)
- Returns:
- language or
null
if language could not be identified
-
detectLanguage
@Nullable public @Nullable DetectedLanguage detectLanguage(String text, List<String> noopLangsTmp, List<String> preferredLangsTmp)
- Parameters:
noopLangsTmp
- list of codes that are detected but will lead to the NoopLanguage that has no rules- Returns:
- language or
null
if language could not be identified - Since:
- 4.4 (new parameter noopLangs, changed return type to DetectedLanguage)
-
-