Class StringTools


  • public final class StringTools
    extends Object
    Tools for working with strings.
    Author:
    Daniel Naber
    • Field Detail

      • UPPERCASE_GREEK_LETTERS

        public static final Set<String> UPPERCASE_GREEK_LETTERS
      • LOWERCASE_GREEK_LETTERS

        public static final Set<String> LOWERCASE_GREEK_LETTERS
    • Method Detail

      • assureSet

        public static void assureSet​(String s,
                                     String varName)
        Throw exception if the given string is null or empty or only whitespace.
      • readStream

        public static String readStream​(InputStream stream,
                                        String encoding)
                                 throws IOException
        Read the text stream using the given encoding.
        Parameters:
        stream - InputStream the stream to be read
        encoding - the stream's character encoding, e.g. utf-8, or null to use the system encoding
        Returns:
        a string with the stream's content, lines separated by \n (note that \n will be added to the last line even if it is not in the stream)
        Throws:
        IOException
        Since:
        2.3
      • isAllUppercase

        public static boolean isAllUppercase​(String str)
        Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists).
      • isMixedCase

        public static boolean isMixedCase​(String str)
        Returns true if the given string is mixed case, like MixedCase or mixedCase (but not Mixedcase).
        Parameters:
        str - input str
      • isNotAllLowercase

        public static boolean isNotAllLowercase​(String str)
        Returns true if str is made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).
        Since:
        2.5
      • isCapitalizedWord

        public static boolean isCapitalizedWord​(String str)
        Parameters:
        str - input string
        Returns:
        true if word starts with an uppercase letter and all other letters are lowercase
      • startsWithUppercase

        public static boolean startsWithUppercase​(String str)
        Whether the first character of str is an uppercase character.
      • uppercaseFirstChar

        @Nullable
        public static @Nullable String uppercaseFirstChar​(String str)
        Return str modified so that its first character is now an uppercase character. If str starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
      • uppercaseFirstChar

        @Nullable
        public static @Nullable String uppercaseFirstChar​(String str,
                                                          Language language)
        Like uppercaseFirstChar(String), but handles a special case for Dutch (IJ in e.g. "ijsselmeer" -> "IJsselmeer").
        Parameters:
        language - the language, will be ignored if it's null
        Since:
        2.7
      • lowercaseFirstChar

        @Nullable
        public static @Nullable String lowercaseFirstChar​(String str)
        Return str modified so that its first character is now an lowercase character. If str starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
      • escapeForXmlAttribute

        public static String escapeForXmlAttribute​(String s)
        Since:
        2.9
      • escapeForXmlContent

        public static String escapeForXmlContent​(String s)
        Since:
        2.9
      • escapeHTML

        public static String escapeHTML​(String s)
        Escapes these characters: less than, greater than, quote, ampersand.
      • trimWhitespace

        public static String trimWhitespace​(String s)
        Filters any whitespace characters. Useful for trimming the contents of token elements that cannot possibly contain any spaces, with the exception for a single space in a word (for example, if the language supports numbers formatted with spaces as single tokens, as Catalan in LanguageTool).
        Parameters:
        s - String to be filtered.
        Returns:
        Filtered s.
      • trimSpecialCharacters

        public static String trimSpecialCharacters​(String s)
        eliminate special (unicode) characters, e.g. soft hyphens
        Parameters:
        s - String to filter
        Returns:
        s, with non-(alphanumeric, punctuation, space) characters deleted
        Since:
        4.3
      • addSpace

        public static String addSpace​(String word,
                                      Language language)
        Adds spaces before words that are not punctuation.
        Parameters:
        word - Word to add the preceding space.
        language - Language of the word (to check typography conventions). Currently French convention of not adding spaces only before '.' and ',' is implemented; other languages assume that before ,.;:!? no spaces should be added.
        Returns:
        String containing a space or an empty string.
      • isWhitespace

        public static boolean isWhitespace​(String str)
        Checks if a string contains a whitespace, including:
        • all Unicode whitespace
        • the non-breaking space (U+00A0)
        • the narrow non-breaking space (U+202F)
        • the zero width space (U+200B), used in Khmer
        Parameters:
        str - String to check
        Returns:
        true if the string is a whitespace character
      • isNonBreakingWhitespace

        public static boolean isNonBreakingWhitespace​(String str)
        Checks if a string is the non-breaking whitespace ( ).
        Since:
        2.1
      • isPositiveNumber

        public static boolean isPositiveNumber​(char ch)
        Parameters:
        ch - Character to check
        Returns:
        True if the character is a positive number (decimal digit from 1 to 9).
      • isEmpty

        public static boolean isEmpty​(String str)
        Helper method to replace calls to "".equals().
        Parameters:
        str - String to check
        Returns:
        true if string is empty or null
      • filterXML

        public static String filterXML​(String str)
        Simple XML filtering for XML tags.
        Parameters:
        str - XML string to be filtered.
        Returns:
        Filtered string without XML tags.
      • isParagraphEnd

        public static boolean isParagraphEnd​(String sentence,
                                             boolean singleLineBreaksMarksPara)
        Since:
        4.3
      • loadLines

        public static List<String> loadLines​(String path)
        Loads file, ignoring comments (lines starting with #).
        Parameters:
        path - path in resource dir
        Since:
        4.6