Class BaseSynthesizer

    • Constructor Detail

      • BaseSynthesizer

        public BaseSynthesizer​(String sorosFileName,
                               String resourceFileName,
                               String tagFileName,
                               Language lang)
        Parameters:
        resourceFileName - The dictionary file name.
        tagFileName - The name of a file containing all possible tags.
      • BaseSynthesizer

        public BaseSynthesizer​(String resourceFileName,
                               String tagFileName,
                               Language lang)
    • Method Detail

      • getDictionary

        protected morfologik.stemming.Dictionary getDictionary()
                                                        throws IOException
        Returns the Dictionary used for this synthesizer. The dictionary file can be defined in the constructor.
        Throws:
        IOException - In case the dictionary cannot be loaded.
      • createStemmer

        protected morfologik.stemming.IStemmer createStemmer()
        Creates a new IStemmer based on the configured dictionary. The result must not be shared among threads.
        Since:
        2.3
      • lookup

        protected void lookup​(String lemma,
                              String posTag,
                              List<String> results)
        Lookup the inflected forms of a lemma defined by a part-of-speech tag.
        Parameters:
        lemma - the lemma to be inflected.
        posTag - the desired part-of-speech tag.
        results - the list to collect the inflected forms.
      • synthesize

        public String[] synthesize​(AnalyzedToken token,
                                   String posTag)
                            throws IOException
        Get a form of a given AnalyzedToken, where the form is defined by a part-of-speech tag.
        Specified by:
        synthesize in interface Synthesizer
        Parameters:
        token - AnalyzedToken to be inflected.
        posTag - The desired part-of-speech tag.
        Returns:
        inflected words, or an empty array if no forms were found
        Throws:
        IOException
      • synthesize

        public String[] synthesize​(AnalyzedToken token,
                                   String posTag,
                                   boolean posTagRegExp)
                            throws IOException
        Description copied from interface: Synthesizer
        Generates a form of the word with a given POS tag for a given lemma. POS tag can be specified using regular expressions.
        Specified by:
        synthesize in interface Synthesizer
        Parameters:
        token - the token to be used for synthesis
        posTag - POS tag of the form to be generated
        posTagRegExp - Specifies whether the posTag string is a regular expression.
        Throws:
        IOException
      • getPosTagCorrection

        public String getPosTagCorrection​(String posTag)
        Description copied from interface: Synthesizer
        Gets a corrected version of the POS tag used for synthesis. Useful when the tagset defines special disjunction that need to be converted into regexp disjunctions.
        Specified by:
        getPosTagCorrection in interface Synthesizer
        Parameters:
        posTag - original POS tag to correct
        Returns:
        converted POS tag
      • getStemmer

        public morfologik.stemming.IStemmer getStemmer()
        Returns:
        the stemmer interface to be used.
        Since:
        2.5
      • getSpelledNumber

        public String getSpelledNumber​(String arabicNumeral)
        Description copied from interface: Synthesizer
        Spells out a number
        Specified by:
        getSpelledNumber in interface Synthesizer
        Parameters:
        arabicNumeral - in arabic numerals
        Returns:
        String of the spelled out number