Class JLanguageTool

  • Direct Known Subclasses:
    MultiThreadedJLanguageTool

    public class JLanguageTool
    extends Object
    The main class used for checking text against different rules:
    • built-in Java rules (for English: a vs. an, whitespace after commas, ...)
    • built-in pattern rules loaded from external XML files (usually called grammar.xml)
    • your own implementation of the abstract Rule classes added with addRule(Rule)

    You will probably want to use the sub class MultiThreadedJLanguageTool for best performance.

    Thread-safety: this class is not thread safe. Create one instance per thread, but create the language only once (e.g. new AmericanEnglish()) and use it for all instances of JLanguageTool.

    See Also:
    MultiThreadedJLanguageTool
    • Field Detail

      • VERSION

        public static final String VERSION
        LanguageTool version as a string like 2.3 or 2.4-SNAPSHOT.
        See Also:
        Constant Field Values
      • BUILD_DATE

        @Nullable
        public static final @Nullable String BUILD_DATE
        LanguageTool build date and time like 2013-10-17 16:10 or null if not run from JAR.
      • GIT_SHORT_ID

        @Nullable
        public static final @Nullable String GIT_SHORT_ID
        Abbreviated git id or null if not available.
        Since:
        4.5
      • FALSE_FRIEND_FILE

        public static final String FALSE_FRIEND_FILE
        The name of the file with false friend information.
        See Also:
        Constant Field Values
      • SENTENCE_START_TAGNAME

        public static final String SENTENCE_START_TAGNAME
        The internal tag used to mark the beginning of a sentence.
        See Also:
        Constant Field Values
      • SENTENCE_END_TAGNAME

        public static final String SENTENCE_END_TAGNAME
        The internal tag used to mark the end of a sentence.
        See Also:
        Constant Field Values
      • PARAGRAPH_END_TAGNAME

        public static final String PARAGRAPH_END_TAGNAME
        The internal tag used to mark the end of a paragraph.
        See Also:
        Constant Field Values
      • MESSAGE_BUNDLE

        public static final String MESSAGE_BUNDLE
        Name of the message bundle for translations.
        See Also:
        Constant Field Values
      • DICTIONARY_FILENAME_EXTENSION

        public static final String DICTIONARY_FILENAME_EXTENSION
        Extension of dictionary files read by Spellers
        See Also:
        Constant Field Values
    • Constructor Detail

      • JLanguageTool

        public JLanguageTool​(Language lang,
                             Language motherTongue)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        lang - the language of the text to be checked
        motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
      • JLanguageTool

        public JLanguageTool​(Language language)
        Create a JLanguageTool and setup the built-in Java rules for the given language.
        Parameters:
        language - the language of the text to be checked
      • JLanguageTool

        public JLanguageTool​(Language language,
                             Language motherTongue,
                             ResultCache cache)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        language - the language of the text to be checked
        motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
        cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes
        Since:
        3.7
      • JLanguageTool

        @Experimental
        public JLanguageTool​(Language language,
                             ResultCache cache,
                             UserConfig userConfig)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        language - the language of the text to be checked
        cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes. Use null to deactivate the cache.
        Since:
        4.2
      • JLanguageTool

        @Experimental
        public JLanguageTool​(Language language,
                             List<Language> altLanguages,
                             Language motherTongue,
                             ResultCache cache,
                             GlobalConfig globalConfig,
                             UserConfig userConfig)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        language - the language of the text to be checked
        altLanguages - The languages that are accepted as alternative languages - currently this means words are accepted if they are in an alternative language and not similar to a word from language. If there's a similar word in language, there will be an error of type RuleMatch.Type.Hint (EXPERIMENTAL)
        motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
        cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes
        Since:
        4.3
      • JLanguageTool

        @Experimental
        public JLanguageTool​(Language language,
                             Language motherTongue,
                             ResultCache cache,
                             UserConfig userConfig)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        language - the language of the text to be checked
        motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
        cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes
        Since:
        4.2
    • Method Detail

      • isPremiumVersion

        public static boolean isPremiumVersion()
        Since:
        4.2
      • getDataBroker

        public static ResourceDataBroker getDataBroker()
        The grammar checker needs resources from following directories:
        • /resource
        • /rules
        Returns:
        The currently set data broker which allows to obtain resources from the mentioned directories above. If no data broker was set, a new DefaultResourceDataBroker will be instantiated and returned.
        Since:
        1.0.1
      • setDataBroker

        public static void setDataBroker​(ResourceDataBroker broker)
        The grammar checker needs resources from following directories:
        • /resource
        • /rules
        Parameters:
        broker - The new resource broker to be used.
        Since:
        1.0.1
      • setListUnknownWords

        public void setListUnknownWords​(boolean listUnknownWords)
        Whether the check(String) methods store unknown words. If set to true (default: false), you can get the list of unknown words using getUnknownWords().
      • setCleanOverlappingMatches

        public void setCleanOverlappingMatches​(boolean cleanOverlappingMatches)
        Whether the check(String) methods return overlapping errors. If set to true (default: true), it removes overlapping errors according to the priorities established for the language.
        Since:
        3.6
      • setMaxErrorsPerWordRate

        @Experimental
        public void setMaxErrorsPerWordRate​(float maxErrorsPerWordRate)
        Maximum errors per word rate, checking will stop with an exception if the rate is higher. For example, with a rate of 0.33, the checking would stop if the user's text has so many errors that more than every 3rd word causes a rule match. Note that this may not apply for very short texts.
        Since:
        4.0
      • getMessageBundle

        public static ResourceBundle getMessageBundle()
        Gets the ResourceBundle (i18n strings) for the default language of the user's system.
      • getMessageBundle

        public static ResourceBundle getMessageBundle​(Language lang)
        Gets the ResourceBundle (i18n strings) for the given user interface language.
        Since:
        2.4 (public since 2.4)
      • setOutput

        public void setOutput​(PrintStream printStream)
        Set a PrintStream that will receive verbose output. Set to null (which is the default) to disable verbose output.
      • loadPatternRules

        public List<AbstractPatternRule> loadPatternRules​(String filename)
                                                   throws IOException
        Load pattern rules from an XML file. Use addRule(Rule) to add these rules to the checking process.
        Parameters:
        filename - path to an XML file in the classpath or in the filesystem - the classpath is checked first
        Returns:
        a List of PatternRule objects
        Throws:
        IOException
      • activateNeuralNetworkRules

        public void activateNeuralNetworkRules​(File modelDir)
                                        throws IOException
        Activate rules that depend on pretrained neural network models.
        Parameters:
        modelDir - root dir of exported models
        Throws:
        IOException
        Since:
        4.4
      • activateLanguageModelRules

        public void activateLanguageModelRules​(File indexDir)
                                        throws IOException
        Activate rules that depend on a language model. The language model currently consists of Lucene indexes with ngram occurrence counts.
        Parameters:
        indexDir - directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence counts
        Throws:
        IOException
        Since:
        2.7
      • activateWord2VecModelRules

        public void activateWord2VecModelRules​(File indexDir)
                                        throws IOException
        Activate rules that depend on a word2vec language model.
        Parameters:
        indexDir - directory with a subdirectories like 'en', each containing dictionary.txt and final_embeddings.txt
        Throws:
        IOException
        Since:
        4.0
      • addMatchFilter

        public void addMatchFilter​(@NotNull
                                   @NotNull RuleMatchFilter filter)
        Add a RuleMatchFilter for post-processing of rule matches Filters are called sequentially in the same order as added
        Parameters:
        filter - filter to add
        Since:
        4.7
      • addRule

        public void addRule​(Rule rule)
        Add a rule to be used by the next call to the check methods like check(String).
      • disableRule

        public void disableRule​(String ruleId)
        Disable a given rule so the check methods like check(String) won't use it.
        Parameters:
        ruleId - the id of the rule to disable - no error will be thrown if the id does not exist
        See Also:
        enableRule(String)
      • disableRules

        public void disableRules​(List<String> ruleIds)
        Disable the given rules so the check methods like check(String) won't use them.
        Parameters:
        ruleIds - the ids of the rules to disable - no error will be thrown if the id does not exist
        Since:
        2.4
      • disableCategory

        public void disableCategory​(CategoryId id)
        Disable the given rule category so the check methods like check(String) won't use it.
        Parameters:
        id - the id of the category to disable - no error will be thrown if the id does not exist
        Since:
        3.3
        See Also:
        enableRuleCategory(CategoryId)
      • isCategoryDisabled

        public boolean isCategoryDisabled​(CategoryId id)
        Returns true if a category is explicitly disabled.
        Parameters:
        id - the id of the category to check - no error will be thrown if the id does not exist
        Returns:
        true if this category is explicitly disabled.
        Since:
        3.5
        See Also:
        disableCategory(org.languagetool.rules.CategoryId)
      • getLanguage

        public Language getLanguage()
        Get the language that was used to configure this instance.
      • getDisabledRules

        public Set<String> getDisabledRules()
        Get rule ids of the rules that have been explicitly disabled.
      • enableRule

        public void enableRule​(String ruleId)
        Enable a given rule so the check methods like check(String) will use it. This will not throw an exception if the given rule id doesn't exist.
        Parameters:
        ruleId - the id of the rule to enable
        See Also:
        disableRule(String)
      • sentenceTokenize

        public List<String> sentenceTokenize​(String text)
        Tokenizes the given text into sentences.
      • check

        public List<RuleMatch> check​(String text)
                              throws IOException
        The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
        Parameters:
        text - the text to be checked
        Returns:
        a List of RuleMatch objects
        Throws:
        IOException
      • check

        public List<RuleMatch> check​(AnnotatedText text)
                              throws IOException
        The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules, adjusting error positions so they refer to the original text including markup.
        Throws:
        IOException
        Since:
        2.3
      • check

        public List<RuleMatch> check​(AnnotatedText annotatedText,
                                     boolean tokenizeText,
                                     JLanguageTool.ParagraphHandling paraMode)
                              throws IOException
        The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
        Parameters:
        annotatedText - The text to be checked, created with AnnotatedTextBuilder. Call this method with the complete text to be checked. If you call it repeatedly with smaller chunks like paragraphs or sentence, those rules that work across paragraphs/sentences won't work (their status gets reset whenever this method is called).
        tokenizeText - If true, then the text is tokenized into sentences. Otherwise, it is assumed it's already tokenized, i.e. it is only one sentence
        paraMode - Uses paragraph-level rules only if true.
        Returns:
        a List of RuleMatch objects, describing potential errors in the text
        Throws:
        IOException
        Since:
        2.3
      • analyzeText

        public List<AnalyzedSentence> analyzeText​(String text)
                                           throws IOException
        Use this method if you want to access LanguageTool's otherwise internal analysis of the text. For actual text checking, use the check... methods instead.
        Parameters:
        text - The text to be analyzed
        Throws:
        IOException
        Since:
        2.5
      • printSentenceInfo

        protected void printSentenceInfo​(AnalyzedSentence analyzedSentence)
      • adjustRuleMatchPos

        public RuleMatch adjustRuleMatchPos​(RuleMatch match,
                                            int charCount,
                                            int columnCount,
                                            int lineCount,
                                            String sentence,
                                            AnnotatedText annotatedText)
        Change RuleMatch positions so they are relative to the complete text, not just to the sentence.
        Parameters:
        charCount - Count of characters in the sentences before
        columnCount - Current column number
        lineCount - Current line number
        sentence - The text being checked
        Returns:
        The RuleMatch object with adjustments
      • rememberUnknownWords

        protected void rememberUnknownWords​(AnalyzedSentence analyzedText)
      • getAnalyzedSentence

        public AnalyzedSentence getAnalyzedSentence​(String sentence)
                                             throws IOException
        Tokenizes the given sentence into words and analyzes it, and then disambiguates POS tags.
        Parameters:
        sentence - sentence to be analyzed
        Throws:
        IOException
      • getCategories

        public Map<CategoryId,​Category> getCategories()
        Get all rule categories for the current language.
        Returns:
        a map of Categories, keyed by their id.
        Since:
        3.5
      • getAllRules

        public List<Rule> getAllRules()
        Get all rules for the current language that are built-in or that have been added using addRule(Rule). Please note that XML rules that are grouped will appear as multiple rules with the same id. To tell them apart, check if they are of type AbstractPatternRule, cast them to that type and call their AbstractPatternRule.getSubId() method.
        Returns:
        a List of Rule objects
      • getAllActiveRules

        public List<Rule> getAllActiveRules()
        Get all active (not disabled) rules for the current language that are built-in or that have been added using e.g. addRule(Rule). See getAllRules() for hints about rule ids.
        Returns:
        a List of Rule objects
      • getAllActiveOfficeRules

        public List<Rule> getAllActiveOfficeRules()
        Works like getAllActiveRules but overrides defaults by office defaults
        Returns:
        a List of Rule objects
        Since:
        4.0
      • getPatternRulesByIdAndSubId

        public List<AbstractPatternRule> getPatternRulesByIdAndSubId​(String id,
                                                                     String subId)
        Get pattern rules by Id and SubId. This returns a list because rules that use <or>...</or> are internally expanded into several rules.
        Returns:
        a List of Rule objects
        Since:
        2.3
      • printIfVerbose

        protected void printIfVerbose​(String s)
      • addTemporaryFile

        public static void addTemporaryFile​(File file)
        Adds a temporary file to the internal list (internal method, you should never need to call this as a user of LanguageTool)
        Parameters:
        file - the file to be added.
      • removeTemporaryFiles

        public static void removeTemporaryFiles()
        Clean up all temporary files, if there are any.
      • applyCustomFilters

        protected List<RuleMatch> applyCustomFilters​(List<RuleMatch> matches,
                                                     AnnotatedText text)
        should be called just once with complete list of matches, before returning them to caller
        Parameters:
        matches - matches after applying rules and default filters
        text - text that matches refer to
        Returns:
        transformed matches (after applying filters in matchFilters)
        Since:
        4.7