Class DictionaryLookup

  • All Implemented Interfaces:
    Iterable<WordData>, IStemmer

    public final class DictionaryLookup
    extends Object
    implements IStemmer, Iterable<WordData>
    This class implements a dictionary lookup of an inflected word over a dictionary previously compiled using the dict_compile tool.
    • Constructor Detail

      • DictionaryLookup

        public DictionaryLookup​(Dictionary dictionary)
                         throws IllegalArgumentException
        Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes.
        Parameters:
        dictionary - The dictionary to use for lookups.
        Throws:
        IllegalArgumentException - if FSA's root node cannot be acquired (dictionary is empty).
    • Method Detail

      • lookup

        public List<WordData> lookup​(CharSequence word)
        Searches the automaton for a symbol sequence equal to word, followed by a separator. The result is a stem (decompressed accordingly to the dictionary's specification) and an optional tag data.
        Specified by:
        lookup in interface IStemmer
        Parameters:
        word - The word (typically inflected) to look up base forms for.
        Returns:
        A list of WordData entries (possibly empty).
      • applyReplacements

        public static String applyReplacements​(CharSequence word,
                                               LinkedHashMap<String,​String> replacements)
        Apply partial string replacements from a given map. Useful if the word needs to be normalized somehow (i.e., ligatures, apostrophes and such).
        Parameters:
        word - The word to apply replacements to.
        replacements - A map of replacements (from->to).
        Returns:
        new string with all replacements applied.
      • getDictionary

        public Dictionary getDictionary()
        Returns:
        Return the Dictionary used by this object.
      • getSeparatorChar

        public char getSeparatorChar()
        Returns:
        Returns the logical separator character splitting inflected form, lemma correction token and a tag. Note that this character is a best-effort conversion from a byte in DictionaryMetadata.separator and may not be valid in the target encoding (although this is highly unlikely).