Class TrimSuffixEncoder

  • All Implemented Interfaces:
    ISequenceEncoder

    public class TrimSuffixEncoder
    extends Object
    implements ISequenceEncoder
    Encodes dst relative to src by trimming whatever non-equal suffix src has. The output code is (bytes):
     {K}{suffix}
     
    where (K - 'A') bytes should be trimmed from the end of src and then the suffix should be appended to the resulting byte sequence.

    Examples:

     src: foo
     dst: foobar
     encoded: Abar
     
     src: foo
     dst: bar
     encoded: Dbar
     
    • Constructor Detail

      • TrimSuffixEncoder

        public TrimSuffixEncoder()
    • Method Detail

      • encode

        public ByteBuffer encode​(ByteBuffer reuse,
                                 ByteBuffer source,
                                 ByteBuffer target)
        Description copied from interface: ISequenceEncoder
        Encodes target relative to source, optionally reusing the provided ByteBuffer.
        Specified by:
        encode in interface ISequenceEncoder
        Parameters:
        reuse - Reuses the provided ByteBuffer or allocates a new one if there is not enough remaining space.
        source - The source byte sequence.
        target - The target byte sequence to encode relative to source
        Returns:
        Returns the ByteBuffer with encoded target.
      • prefixBytes

        public int prefixBytes()
        Description copied from interface: ISequenceEncoder
        The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.
        Specified by:
        prefixBytes in interface ISequenceEncoder
        See Also:
        "https://github.com/morfologik/morfologik-stemming/issues/85"