Package morfologik.stemming
Interface ISequenceEncoder
-
- All Known Implementing Classes:
NoEncoder
,TrimInfixAndSuffixEncoder
,TrimPrefixAndSuffixEncoder
,TrimSuffixEncoder
public interface ISequenceEncoder
The logic of encoding one sequence of bytes relative to another sequence of bytes. The "base" form and the "derived" form are typically the stem of a word and the inflected form of a word.Derived form encoding helps in making the data for the automaton smaller and more repetitive (which results in higher compression rates).
See example implementation for details.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Deprecated Methods Modifier and Type Method Description ByteBuffer
decode(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded)
ByteBuffer
encode(ByteBuffer reuse, ByteBuffer source, ByteBuffer target)
int
prefixBytes()
Deprecated.
-
-
-
Method Detail
-
encode
ByteBuffer encode(ByteBuffer reuse, ByteBuffer source, ByteBuffer target)
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.target
- The target byte sequence to encode relative tosource
- Returns:
- Returns the
ByteBuffer
with encodedtarget
.
-
decode
ByteBuffer decode(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded)
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.encoded
- The previously encoded byte sequence.- Returns:
- Returns the
ByteBuffer
with decodedtarget
.
-
prefixBytes
@Deprecated int prefixBytes()
Deprecated.The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.- See Also:
- "https://github.com/morfologik/morfologik-stemming/issues/85"
-
-