Class Language

  • Direct Known Subclasses:
    NoopLanguage

    public abstract class Language
    extends Object
    Base class for any supported language (English, German, etc). Language classes are detected at runtime by searching the classpath for files named META-INF/org/languagetool/language-module.properties. Those file(s) need to contain a key languageClasses which specifies the fully qualified class name(s), e.g. org.languagetool.language.English. Use commas to specify more than one class.

    Sub classes should typically use lazy init for anything that's costly to set up. This improves start up time for the LanguageTool stand-alone version.

    • Constructor Detail

      • Language

        public Language()
    • Method Detail

      • getShortCode

        public abstract String getShortCode()
        Get this language's character code, e.g. en for English. For most languages this is a two-letter code according to ISO 639-1, but for those languages that don't have a two-letter code, a three-letter code according to ISO 639-2 is returned. The country parameter (e.g. "US"), if any, is not returned.
        Since:
        3.6
      • getName

        public abstract String getName()
        Get this language's name in English, e.g. English or German (Germany).
        Returns:
        language name
      • getCountries

        public abstract String[] getCountries()
        Get this language's country options , e.g. US (as in en-US) or PL (as in pl-PL).
        Returns:
        String[] - array of country options for the language.
      • getMaintainers

        @Nullable
        public abstract @Nullable Contributor[] getMaintainers()
        Get the name(s) of the maintainer(s) for this language or null.
      • getCommonWordsPath

        public String getCommonWordsPath()
        A file with commons words, either in the classpath or as a filename in the file system.
        Since:
        4.5
      • getVariant

        @Nullable
        public @Nullable String getVariant()
        Get this language's variant, e.g. valencia (as in ca-ES-valencia) or null. Attention: not to be confused with "country" option
        Returns:
        variant for the language or null
        Since:
        2.3
      • getDefaultEnabledRulesForVariant

        public List<String> getDefaultEnabledRulesForVariant()
        Get enabled rules different from the default ones for this language variant.
        Returns:
        enabled rules for the language variant.
        Since:
        2.4
      • getDefaultDisabledRulesForVariant

        public List<String> getDefaultDisabledRulesForVariant()
        Get disabled rules different from the default ones for this language variant.
        Returns:
        disabled rules for the language variant.
        Since:
        2.4
      • getLanguageModel

        @Nullable
        public @Nullable LanguageModel getLanguageModel​(File indexDir)
                                                 throws IOException
        Parameters:
        indexDir - directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence counts
        Returns:
        a LanguageModel or null if this language doesn't support one
        Throws:
        IOException
        Since:
        2.7
      • getWord2VecModel

        @Nullable
        public @Nullable Word2VecModel getWord2VecModel​(File indexDir)
                                                 throws IOException
        Parameters:
        indexDir - directory with a subdirectories like 'en', each containing dictionary.txt and final_embeddings.txt
        Returns:
        a Word2VecModel or null if this language doesn't support one
        Throws:
        IOException
        Since:
        4.0
      • getRelevantNeuralNetworkModels

        public List<Rule> getRelevantNeuralNetworkModels​(ResourceBundle messages,
                                                         File modelDir)
        Get a list of rules that load trained neural networks. Returns an empty list for languages that don't have such rules.
        Since:
        4.4
      • getLocale

        public Locale getLocale()
        Get this language's Java locale, not considering the country code.
      • getLocaleWithCountryAndVariant

        public Locale getLocaleWithCountryAndVariant()
        Get this language's Java locale, considering language code and country code (if any).
        Since:
        2.1
      • getRuleFileNames

        public List<String> getRuleFileNames()
        Get the location of the rule file(s) in a form like /org/languagetool/rules/de/grammar.xml, i.e. a path in the classpath. The files must exist or an exception will be thrown, unless the filename contains the string -test-.
      • getDefaultLanguageVariant

        @Nullable
        public @Nullable Language getDefaultLanguageVariant()
        Languages that have country variants need to overwrite this to select their most common variant.
        Returns:
        default country variant or null
        Since:
        1.8
      • getDisambiguator

        public Disambiguator getDisambiguator()
        Get this language's part-of-speech disambiguator implementation.
      • getTagger

        public Tagger getTagger()
        Get this language's part-of-speech tagger implementation. The tagger must not be null, but it can be a trivial pseudo-tagger that only assigns null tags.
      • getSentenceTokenizer

        public SentenceTokenizer getSentenceTokenizer()
        Get this language's sentence tokenizer implementation.
      • getWordTokenizer

        public Tokenizer getWordTokenizer()
        Get this language's word tokenizer implementation.
      • getChunker

        @Nullable
        public @Nullable Chunker getChunker()
        Get this language's chunker implementation or null.
        Since:
        2.3
      • getPostDisambiguationChunker

        @Nullable
        public @Nullable Chunker getPostDisambiguationChunker()
        Get this language's chunker implementation or null.
        Since:
        2.9
      • getSynthesizer

        @Nullable
        public @Nullable Synthesizer getSynthesizer()
        Get this language's part-of-speech synthesizer implementation or null.
      • getUnifier

        public Unifier getUnifier()
        Get this language's feature unifier.
        Returns:
        Feature unifier for analyzed tokens.
      • getDisambiguationUnifier

        public Unifier getDisambiguationUnifier()
        Get this language's feature unifier used for disambiguation. Note: it might be different from the normal rule unifier.
        Returns:
        Feature unifier for analyzed tokens.
      • getDisambiguationUnifierConfiguration

        public UnifierConfiguration getDisambiguationUnifierConfiguration()
        Since:
        2.3
      • getTranslatedName

        public final String getTranslatedName​(ResourceBundle messages)
        Get the name of the language translated to the current locale, if available. Otherwise, get the untranslated name.
      • getShortCodeWithCountryAndVariant

        public final String getShortCodeWithCountryAndVariant()
        Get the short name of the language with country and variant (if any), if it is a single-country language. For generic language classes, get only a two- or three-character code.
        Since:
        3.6
      • isVariant

        public final boolean isVariant()
        Whether this is a country variant of another language, i.e. whether it doesn't directly extend Language, but a subclass of Language.
        Since:
        1.8
      • hasVariant

        public final boolean hasVariant()
        Whether this class has at least one subclass that implements variants of this language.
        Since:
        1.8
      • isExternal

        public boolean isExternal()
        For internal use only. Overwritten to return true for languages that have been loaded from an external file after start up.
      • equalsConsiderVariantsIfSpecified

        public boolean equalsConsiderVariantsIfSpecified​(Language otherLanguage)
        Return true if this is the same language as the given one, considering country variants only if set for both languages. For example: en = en, en = en-GB, en-GB = en-GB, but en-US != en-GB
        Since:
        1.8
      • getIgnoredCharactersRegex

        public Pattern getIgnoredCharactersRegex()
        Returns:
        Return compiled regular expression to ignore inside tokens
        Since:
        2.9
      • getMaintainedState

        public LanguageMaintainedState getMaintainedState()
        Information about whether the support for this language in LanguageTool is actively maintained. If not, the user interface might show a warning.
        Since:
        3.3
      • isHiddenFromGui

        public boolean isHiddenFromGui()
      • getPriorityForId

        public int getPriorityForId​(String id)
        Returns a priority for Rule or Category Id (default: 0). Positive integers have higher priority. Negative integers have lower priority.
        Since:
        3.6
      • isSpellcheckOnlyLanguage

        public boolean isSpellcheckOnlyLanguage()
        Whether this language supports spell checking only and no advanced grammar and style checking.
        Since:
        4.5
      • equals

        public boolean equals​(Object o)
        Considers languages as equal if their language code, including the country and variant codes are equal.
        Overrides:
        equals in class Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object