Package org.languagetool.tools
Class StringTools
- java.lang.Object
-
- org.languagetool.tools.StringTools
-
public final class StringTools extends Object
Tools for working with strings.- Author:
- Daniel Naber
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
StringTools.ApiPrintMode
Constants for printing XML rule matches.
-
Field Summary
Fields Modifier and Type Field Description static Set<String>
LOWERCASE_GREEK_LETTERS
static Set<String>
UPPERCASE_GREEK_LETTERS
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String
addSpace(String word, Language language)
Adds spaces before words that are not punctuation.static @Nullable String
asString(CharSequence s)
static void
assureSet(String s, String varName)
Throw exception if the given string is null or empty or only whitespace.static String
escapeForXmlAttribute(String s)
static String
escapeForXmlContent(String s)
static String
escapeHTML(String s)
Escapes these characters: less than, greater than, quote, ampersand.static String
escapeXML(String s)
CallsescapeHTML(String)
.static String
filterXML(String str)
Simple XML filtering for XML tags.static boolean
isAllUppercase(String str)
Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists).static boolean
isCapitalizedWord(String str)
static boolean
isEmpty(String str)
Helper method to replace calls to"".equals()
.static boolean
isMixedCase(String str)
Returns true if the given string is mixed case, likeMixedCase
ormixedCase
(but notMixedcase
).static boolean
isNonBreakingWhitespace(String str)
Checks if a string is the non-breaking whitespace (static boolean
isNotAllLowercase(String str)
Returns true ifstr
is made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).static boolean
isParagraphEnd(String sentence, boolean singleLineBreaksMarksPara)
static boolean
isPositiveNumber(char ch)
static boolean
isWhitespace(String str)
Checks if a string contains a whitespace, including: all Unicode whitespace the non-breaking space (U+00A0) the narrow non-breaking space (U+202F) the zero width space (U+200B), used in Khmerstatic List<String>
loadLines(String path)
Loads file, ignoring comments (lines starting with#
).static @Nullable String
lowercaseFirstChar(String str)
Returnstr
modified so that its first character is now an lowercase character.static String
readerToString(Reader reader)
static String
readStream(InputStream stream, String encoding)
Read the text stream using the given encoding.static boolean
startsWithUppercase(String str)
Whether the first character ofstr
is an uppercase character.static String
streamToString(InputStream is, String charsetName)
static String
trimSpecialCharacters(String s)
eliminate special (unicode) characters, e.g.static String
trimWhitespace(String s)
Filters any whitespace characters.static @Nullable String
uppercaseFirstChar(String str)
Returnstr
modified so that its first character is now an uppercase character.static @Nullable String
uppercaseFirstChar(String str, Language language)
LikeuppercaseFirstChar(String)
, but handles a special case for Dutch (IJ in e.g.
-
-
-
Method Detail
-
assureSet
public static void assureSet(String s, String varName)
Throw exception if the given string is null or empty or only whitespace.
-
readStream
public static String readStream(InputStream stream, String encoding) throws IOException
Read the text stream using the given encoding.- Parameters:
stream
- InputStream the stream to be readencoding
- the stream's character encoding, e.g.utf-8
, ornull
to use the system encoding- Returns:
- a string with the stream's content, lines separated by
\n
(note that\n
will be added to the last line even if it is not in the stream) - Throws:
IOException
- Since:
- 2.3
-
isAllUppercase
public static boolean isAllUppercase(String str)
Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists).
-
isMixedCase
public static boolean isMixedCase(String str)
Returns true if the given string is mixed case, likeMixedCase
ormixedCase
(but notMixedcase
).- Parameters:
str
- input str
-
isNotAllLowercase
public static boolean isNotAllLowercase(String str)
Returns true ifstr
is made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).- Since:
- 2.5
-
isCapitalizedWord
public static boolean isCapitalizedWord(String str)
- Parameters:
str
- input string- Returns:
- true if word starts with an uppercase letter and all other letters are lowercase
-
startsWithUppercase
public static boolean startsWithUppercase(String str)
Whether the first character ofstr
is an uppercase character.
-
uppercaseFirstChar
@Nullable public static @Nullable String uppercaseFirstChar(String str)
Returnstr
modified so that its first character is now an uppercase character. Ifstr
starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
-
uppercaseFirstChar
@Nullable public static @Nullable String uppercaseFirstChar(String str, Language language)
LikeuppercaseFirstChar(String)
, but handles a special case for Dutch (IJ in e.g. "ijsselmeer" -> "IJsselmeer").- Parameters:
language
- the language, will be ignored if it'snull
- Since:
- 2.7
-
lowercaseFirstChar
@Nullable public static @Nullable String lowercaseFirstChar(String str)
Returnstr
modified so that its first character is now an lowercase character. Ifstr
starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
-
readerToString
public static String readerToString(Reader reader) throws IOException
- Throws:
IOException
-
streamToString
public static String streamToString(InputStream is, String charsetName) throws IOException
- Throws:
IOException
-
escapeXML
public static String escapeXML(String s)
CallsescapeHTML(String)
.
-
escapeHTML
public static String escapeHTML(String s)
Escapes these characters: less than, greater than, quote, ampersand.
-
trimWhitespace
public static String trimWhitespace(String s)
Filters any whitespace characters. Useful for trimming the contents of token elements that cannot possibly contain any spaces, with the exception for a single space in a word (for example, if the language supports numbers formatted with spaces as single tokens, as Catalan in LanguageTool).- Parameters:
s
- String to be filtered.- Returns:
- Filtered s.
-
trimSpecialCharacters
public static String trimSpecialCharacters(String s)
eliminate special (unicode) characters, e.g. soft hyphens- Parameters:
s
- String to filter- Returns:
- s, with non-(alphanumeric, punctuation, space) characters deleted
- Since:
- 4.3
-
addSpace
public static String addSpace(String word, Language language)
Adds spaces before words that are not punctuation.- Parameters:
word
- Word to add the preceding space.language
- Language of the word (to check typography conventions). Currently French convention of not adding spaces only before '.' and ',' is implemented; other languages assume that before ,.;:!? no spaces should be added.- Returns:
- String containing a space or an empty string.
-
isWhitespace
public static boolean isWhitespace(String str)
Checks if a string contains a whitespace, including:- all Unicode whitespace
- the non-breaking space (U+00A0)
- the narrow non-breaking space (U+202F)
- the zero width space (U+200B), used in Khmer
- Parameters:
str
- String to check- Returns:
- true if the string is a whitespace character
-
isNonBreakingWhitespace
public static boolean isNonBreakingWhitespace(String str)
Checks if a string is the non-breaking whitespace (- Since:
- 2.1
-
isPositiveNumber
public static boolean isPositiveNumber(char ch)
- Parameters:
ch
- Character to check- Returns:
- True if the character is a positive number (decimal digit from 1 to 9).
-
isEmpty
public static boolean isEmpty(String str)
Helper method to replace calls to"".equals()
.- Parameters:
str
- String to check- Returns:
- true if string is empty or
null
-
filterXML
public static String filterXML(String str)
Simple XML filtering for XML tags.- Parameters:
str
- XML string to be filtered.- Returns:
- Filtered string without XML tags.
-
asString
@Nullable public static @Nullable String asString(CharSequence s)
-
isParagraphEnd
public static boolean isParagraphEnd(String sentence, boolean singleLineBreaksMarksPara)
- Since:
- 4.3
-
-