Package morfologik.speller
Class Speller
- java.lang.Object
-
- morfologik.speller.Speller
-
public class Speller extends Object
Finds spelling suggestions. Implements K. Oflazer's algorithm as described in: Oflazer, Kemal. 1996. "Error-Tolerant Finite-State Recognition with Applications to Morphological Analysis and Spelling Correction." Computational Linguistics 22 (1): 73–89.See Jan Daciuk's
s_fsa
package.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
Speller.CandidateData
Used to sort candidates according to edit distance, and possibly according to their frequency in the future.
-
Field Summary
Fields Modifier and Type Field Description static int
MAX_WORD_LENGTH
Maximum length of the word to be checked.
-
Constructor Summary
Constructors Constructor Description Speller(Dictionary dictionary)
Speller(Dictionary dictionary, int editDistance)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
convertsCase()
Used to determine whether the dictionary supports case conversions.int
cuted(int depth, int wordIndex, int candIndex)
Calculates cut-off edit distance.int
ed(int i, int j, int wordIndex, int candIndex)
Calculates edit distance.ArrayList<Speller.CandidateData>
findReplacementCandidates(String word)
Find and return suggestions by using K.ArrayList<String>
findReplacements(String word)
Find suggestions by using K.List<String>
getAllReplacements(String str, int fromIndex, int level)
int
getCandLen()
int
getEffectiveED()
int
getFrequency(CharSequence word)
Get the frequency value for a word form.int
getWordLen()
boolean
isCamelCase(String str)
boolean
isInDictionary(CharSequence word)
Test whether the word is found in the dictionary.boolean
isMisspelled(String word)
Checks whether the word is misspelled, by performing a series of checks according to properties of the dictionary.List<String>
replaceRunOnWords(String original)
Propose suggestions for misspelled run-on words.
-
-
-
Field Detail
-
MAX_WORD_LENGTH
public static final int MAX_WORD_LENGTH
Maximum length of the word to be checked.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
Speller
public Speller(Dictionary dictionary)
-
Speller
public Speller(Dictionary dictionary, int editDistance)
-
-
Method Detail
-
isMisspelled
public boolean isMisspelled(String word)
Checks whether the word is misspelled, by performing a series of checks according to properties of the dictionary. If the flagfsa.dict.speller.ignore-punctuation
is set, then all non-alphabetic characters are considered to be correctly spelled. If the flagfsa.dict.speller.ignore-numbers
is set, then all words containing decimal digits are considered to be correctly spelled. If the flagfsa.dict.speller.ignore-camel-case
is set, then all CamelCase words are considered to be correctly spelled. If the flagfsa.dict.speller.ignore-all-uppercase
is set, then all alphabetic words composed of only uppercase characters are considered to be correctly spelled. Otherwise, the word is checked in the dictionary. If the test fails, and the dictionary does not perform any case conversions (as set byfsa.dict.speller.convert-case
flag), then the method returns false. In case of case conversions, it is checked whether a non-mixed case word is found in its lowercase version in the dictionary, and for all-uppercase words, whether the word is found in the dictionary with the initial uppercase letter.- Parameters:
word
- - the word to be checked- Returns:
- true if the word is misspelled
-
isInDictionary
public boolean isInDictionary(CharSequence word)
Test whether the word is found in the dictionary.- Parameters:
word
- the word to be tested- Returns:
- True if it is found.
-
getFrequency
public int getFrequency(CharSequence word)
Get the frequency value for a word form. It is taken from the first entry with this word form.- Parameters:
word
- the word to be tested- Returns:
- frequency value in range: 0..FREQ_RANGE-1 (0: less frequent).
-
replaceRunOnWords
public List<String> replaceRunOnWords(String original)
Propose suggestions for misspelled run-on words. This algorithm is inspired by spell.cc in s_fsa package by Jan Daciuk.- Parameters:
original
- The original misspelled word.- Returns:
- The list of suggested pairs, as space-concatenated strings.
-
findReplacements
public ArrayList<String> findReplacements(String word)
Find suggestions by using K. Oflazer's algorithm. See Jan Daciuk's s_fsa package, spell.cc for further explanation.- Parameters:
word
- The original misspelled word.- Returns:
- A list of suggested replacements.
-
findReplacementCandidates
public ArrayList<Speller.CandidateData> findReplacementCandidates(String word)
Find and return suggestions by using K. Oflazer's algorithm. See Jan Daciuk's s_fsa package, spell.cc for further explanation. This method is identical tofindReplacements(java.lang.String)
, but returns candidate terms with their edit distance scores.- Parameters:
word
- The original misspelled word.- Returns:
- A list of suggested candidate replacements.
-
ed
public int ed(int i, int j, int wordIndex, int candIndex)
Calculates edit distance.- Parameters:
i
- length of first word (here: misspelled) - 1;j
- length of second word (here: candidate) - 1.wordIndex
- (TODO: javadoc?)candIndex
- (TODO: javadoc?)- Returns:
- Edit distance between the two words. Remarks: See Oflazer.
-
cuted
public int cuted(int depth, int wordIndex, int candIndex)
Calculates cut-off edit distance.- Parameters:
depth
- current length of candidates.wordIndex
- (TODO: javadoc?)candIndex
- (TODO: javadoc?)- Returns:
- Cut-off edit distance. Remarks: See Oflazer.
-
isCamelCase
public boolean isCamelCase(String str)
- Parameters:
str
- The string to check.- Returns:
- Returns true if str is CamelCase. Note that German compounds with a dash (like "Waschmaschinen-Test") are also considered camel case by this method.
-
convertsCase
public boolean convertsCase()
Used to determine whether the dictionary supports case conversions.- Returns:
- boolean value that answers this question in a deep and meaningful way.
- Since:
- 1.9
-
getAllReplacements
public List<String> getAllReplacements(String str, int fromIndex, int level)
- Parameters:
str
- The string to find the replacements for.fromIndex
- The index from which replacements are found.level
- The recursion level. The search stops if level is > MAX_RECURSION_LEVEL.- Returns:
- A list of all possible replacements of a {#link str} given string
-
getWordLen
public final int getWordLen()
-
getCandLen
public final int getCandLen()
-
getEffectiveED
public final int getEffectiveED()
-
-