Class MultiWordChunker2

  • All Implemented Interfaces:
    Disambiguator

    public class MultiWordChunker2
    extends AbstractDisambiguator
    Multiword tagger-chunker. Note: currently does not support:
    • overlapping tagging (first matching multiword entry wins)
    Author:
    Andriy Rysin
    • Constructor Detail

      • MultiWordChunker2

        public MultiWordChunker2​(String filename)
        Parameters:
        filename - file text with multiwords and tags
      • MultiWordChunker2

        public MultiWordChunker2​(String filename,
                                 boolean allowFirstCapitalized)
        Parameters:
        filename - file text with multiwords and tags
        allowFirstCapitalized - if set to true, first word of the multiword can be capitalized
    • Method Detail

      • setRemoveOtherReadings

        public void setRemoveOtherReadings​(boolean removeOtherReadings)
        Parameters:
        removeOtherReadings - If true and multiword matches other readings will be removed
      • setWrapTag

        public void setWrapTag​(boolean wrapTag)
        Parameters:
        wrapTag - If true the tag will be wrapped with < and >
      • formatPosTag

        protected String formatPosTag​(String posTag,
                                      int position,
                                      int multiwordLength)
        Override this method if you want format POS tag differently
        Parameters:
        posTag - POS tag for the multiword
        position - Position of the token in the multiword
        Returns:
        Returns formatted POS tag for the multiword
      • disambiguate

        public AnalyzedSentence disambiguate​(AnalyzedSentence input)
        Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.
        Parameters:
        input - The tokens to be chunked.
        Returns:
        AnalyzedSentence with additional markers.