Class TrimPrefixAndSuffixEncoder

  • All Implemented Interfaces:
    ISequenceEncoder

    public class TrimPrefixAndSuffixEncoder
    extends Object
    implements ISequenceEncoder
    Encodes dst relative to src by trimming whatever non-equal suffix and prefix src and dst have. The output code is (bytes):
     {P}{K}{suffix}
     
    where (P - 'A') bytes should be trimmed from the start of src, (K - 'A') bytes should be trimmed from the end of src and then the suffix should be appended to the resulting byte sequence.

    Examples:

     src: abc
     dst: abcd
     encoded: AAd
     
     src: abc
     dst: xyz
     encoded: ADxyz
     
    • Constructor Detail

      • TrimPrefixAndSuffixEncoder

        public TrimPrefixAndSuffixEncoder()
    • Method Detail

      • encode

        public ByteBuffer encode​(ByteBuffer reuse,
                                 ByteBuffer source,
                                 ByteBuffer target)
        Description copied from interface: ISequenceEncoder
        Encodes target relative to source, optionally reusing the provided ByteBuffer.
        Specified by:
        encode in interface ISequenceEncoder
        Parameters:
        reuse - Reuses the provided ByteBuffer or allocates a new one if there is not enough remaining space.
        source - The source byte sequence.
        target - The target byte sequence to encode relative to source
        Returns:
        Returns the ByteBuffer with encoded target.
      • prefixBytes

        public int prefixBytes()
        Description copied from interface: ISequenceEncoder
        The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.
        Specified by:
        prefixBytes in interface ISequenceEncoder
        See Also:
        "https://github.com/morfologik/morfologik-stemming/issues/85"