Class LangProfile

  • All Implemented Interfaces:
    Serializable

    @Deprecated
    public class LangProfile
    extends Object
    implements Serializable
    Deprecated.
    replaced by LanguageProfile
    LangProfile is a Language Profile Class. Users don't use this class directly. TODO split into builder and immutable class. TODO currently this only makes n-grams with the space before a word included. no n-gram with the space after the word. Example: "foo" creates " fo" as 3gram, but not "oo ". Either this is a bug, or if intended then needs documentation.
    Author:
    Nakatani Shuyo
    See Also:
    Serialized Form
    • Constructor Detail

      • LangProfile

        public LangProfile()
        Deprecated.
        Constructor for JSONIC
      • LangProfile

        public LangProfile​(String name)
        Deprecated.
        Normal Constructor
        Parameters:
        name - language name
    • Method Detail

      • add

        public void add​(@NotNull
                        @NotNull String gram)
        Deprecated.
        Add n-gram to profile
        Parameters:
        gram -
      • omitLessFreq

        public void omitLessFreq()
        Deprecated.
        Removes ngrams that occur fewer times than MINIMUM_FREQ to get rid of rare ngrams. Also removes ascii ngrams if the total number of ascii ngrams is less than one third of the total. This is done because non-latin text (such as Chinese) often has some latin noise in between. TODO split the 2 cleaning to separate methods. TODO distinguish ascii/latin, currently it looks for latin only, should include characters with diacritics, eg Vietnamese. TODO current code counts ascii, but removes any latin. is that desired? if so then this needs documentation.
      • getName

        public String getName()
        Deprecated.
      • setName

        public void setName​(String name)
        Deprecated.
      • getNWords

        public int[] getNWords()
        Deprecated.
      • setNWords

        public void setNWords​(int[] nWords)
        Deprecated.