Package morfologik.stemming
Class DictionaryLookup
- java.lang.Object
-
- morfologik.stemming.DictionaryLookup
-
-
Constructor Summary
Constructors Constructor Description DictionaryLookup(Dictionary dictionary)
Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static String
applyReplacements(CharSequence word, LinkedHashMap<String,String> replacements)
Apply partial string replacements from a given map.Dictionary
getDictionary()
char
getSeparatorChar()
Iterator<WordData>
iterator()
Return an iterator over allWordData
entries available in the embeddedDictionary
.List<WordData>
lookup(CharSequence word)
Searches the automaton for a symbol sequence equal toword
, followed by a separator.
-
-
-
Constructor Detail
-
DictionaryLookup
public DictionaryLookup(Dictionary dictionary) throws IllegalArgumentException
Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes.- Parameters:
dictionary
- The dictionary to use for lookups.- Throws:
IllegalArgumentException
- if FSA's root node cannot be acquired (dictionary is empty).
-
-
Method Detail
-
lookup
public List<WordData> lookup(CharSequence word)
Searches the automaton for a symbol sequence equal toword
, followed by a separator. The result is a stem (decompressed accordingly to the dictionary's specification) and an optional tag data.
-
applyReplacements
public static String applyReplacements(CharSequence word, LinkedHashMap<String,String> replacements)
Apply partial string replacements from a given map. Useful if the word needs to be normalized somehow (i.e., ligatures, apostrophes and such).- Parameters:
word
- The word to apply replacements to.replacements
- A map of replacements (from->to).- Returns:
- new string with all replacements applied.
-
iterator
public Iterator<WordData> iterator()
Return an iterator over allWordData
entries available in the embeddedDictionary
.
-
getDictionary
public Dictionary getDictionary()
- Returns:
- Return the
Dictionary
used by this object.
-
getSeparatorChar
public char getSeparatorChar()
- Returns:
- Returns the logical separator character splitting inflected form,
lemma correction token and a tag. Note that this character is a best-effort
conversion from a byte in
DictionaryMetadata.separator
and may not be valid in the target encoding (although this is highly unlikely).
-
-