Package morfologik.stemming
Class TrimInfixAndSuffixEncoder
- java.lang.Object
-
- morfologik.stemming.TrimInfixAndSuffixEncoder
-
- All Implemented Interfaces:
ISequenceEncoder
public class TrimInfixAndSuffixEncoder extends Object implements ISequenceEncoder
Encodesdst
relative tosrc
by trimming whatever non-equal suffix and infixsrc
anddst
have. The output code is (bytes):{X}{L}{K}{suffix}
wheresrc's
infix at position (X
- 'A') and of length (L
- 'A') should be removed, then (K
- 'A') bytes should be trimmed from the end and then thesuffix
should be appended to the resulting byte sequence.Examples:
src: ayz dst: abc encoded: AACbc src: aillent dst: aller encoded: BBCr
-
-
Constructor Summary
Constructors Constructor Description TrimInfixAndSuffixEncoder()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ByteBuffer
decode(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded)
ByteBuffer
encode(ByteBuffer reuse, ByteBuffer source, ByteBuffer target)
int
prefixBytes()
The number of encoded form's prefix bytes that should be ignored (needed for separator lookup).String
toString()
-
-
-
Method Detail
-
encode
public ByteBuffer encode(ByteBuffer reuse, ByteBuffer source, ByteBuffer target)
Description copied from interface:ISequenceEncoder
- Specified by:
encode
in interfaceISequenceEncoder
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.target
- The target byte sequence to encode relative tosource
- Returns:
- Returns the
ByteBuffer
with encodedtarget
.
-
prefixBytes
public int prefixBytes()
Description copied from interface:ISequenceEncoder
The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.- Specified by:
prefixBytes
in interfaceISequenceEncoder
- See Also:
- "https://github.com/morfologik/morfologik-stemming/issues/85"
-
decode
public ByteBuffer decode(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded)
Description copied from interface:ISequenceEncoder
- Specified by:
decode
in interfaceISequenceEncoder
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.encoded
- The previously encoded byte sequence.- Returns:
- Returns the
ByteBuffer
with decodedtarget
.
-
-