Package morfologik.fsa
Class CFSA2
- java.lang.Object
-
- morfologik.fsa.FSA
-
- morfologik.fsa.CFSA2
-
- All Implemented Interfaces:
Iterable<ByteBuffer>
public final class CFSA2 extends FSA
CFSA (Compact Finite State Automaton) binary format implementation, version 2:BIT_TARGET_NEXT
applicable on all arcs, not necessarily the last one.- v-coded goto field
- v-coded perfect hashing numbers, if any
- 31 most frequent labels integrated with flags byte
The encoding of automaton body is as follows.
---- CFSA header Byte Description +-+-+-+-+-+-+-+-+\ 0 | | | | | | | | | +------ '\' +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 1 | | | | | | | | | +------ 'f' +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 2 | | | | | | | | | +------ 's' +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 3 | | | | | | | | | +------ 'a' +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 4 | | | | | | | | | +------ version (fixed 0xc6) +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 5 | | | | | | | | | +----\ +-+-+-+-+-+-+-+-+/ \ flags [MSB first] +-+-+-+-+-+-+-+-+\ / 6 | | | | | | | | | +----/ +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 7 | | | | | | | | | +------ label lookup table size +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 8-32 | | | | | | | | | +------ label value lookup table : : : : : : : : : | +-+-+-+-+-+-+-+-+/ ---- Start of a node; only if automaton was compiled with NUMBERS option. Byte +-+-+-+-+-+-+-+-+\ 0 | | | | | | | | | \ +-+-+-+-+-+-+-+-+ + 1 | | | | | | | | | | number of strings recognized +-+-+-+-+-+-+-+-+ +----- by the automaton starting : : : : : : : : : | from this node. v-coding +-+-+-+-+-+-+-+-+ + | | | | | | | | | / +-+-+-+-+-+-+-+-+/ ---- A vector of this node's arcs. An arc's layout depends on the combination of flags. 1) NEXT bit set, mapped arc label. +----------------------- node pointed to is next | +--------------------- the last arc of the node | | +------------------- this arc leads to a final state (acceptor) | | | _______+--------- arc's label; indexed if M > 0, otherwise explicit label follows | | | / | | | | +-+-+-+-+-+-+-+-+\ 0 |N|L|F|M|M|M|M|M| +------ flags + (M) index of the mapped label. +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 1 | | | | | | | | | +------ optional label if M == 0 +-+-+-+-+-+-+-+-+/ : : : : : : : : : +-+-+-+-+-+-+-+-+\ |A|A|A|A|A|A|A|A| +------ v-coded goto address +-+-+-+-+-+-+-+-+/
-
-
Field Summary
Fields Modifier and Type Field Description byte[]
arcs
An array of bytes with the internal representation of the automaton.static int
BIT_FINAL_ARC
The arc corresponds to the last character of a sequence available when building the automaton (acceptor transition).static int
BIT_LAST_ARC
The arc is the last one from the current node's arcs list.static int
BIT_TARGET_NEXT
The target node of this arc follows the last arc of the current state (no goto field).static int
LABEL_INDEX_SIZE
Maximum size of the labels index.byte[]
labelMapping
Label mapping for M-indexed labels.static byte
VERSION
Automaton header version value.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getArc(int node, byte label)
byte
getArcLabel(int arc)
int
getEndNode(int arc)
int
getFirstArc(int node)
Set<FSAFlags>
getFlags()
int
getNextArc(int arc)
int
getRightLanguageCount(int node)
int
getRootNode()
boolean
isArcFinal(int arc)
boolean
isArcLast(int arc)
Returnstrue
if this arc hasNEXT
bit set.boolean
isArcTerminal(int arc)
boolean
isNextSet(int arc)
-
Methods inherited from class morfologik.fsa.FSA
getArcCount, getSequences, getSequences, iterator, read, read, readRemaining, visitAllStates, visitInPostOrder, visitInPostOrder, visitInPreOrder, visitInPreOrder
-
-
-
-
Field Detail
-
VERSION
public static final byte VERSION
Automaton header version value.- See Also:
- Constant Field Values
-
BIT_TARGET_NEXT
public static final int BIT_TARGET_NEXT
The target node of this arc follows the last arc of the current state (no goto field).- See Also:
- Constant Field Values
-
BIT_LAST_ARC
public static final int BIT_LAST_ARC
The arc is the last one from the current node's arcs list.- See Also:
- Constant Field Values
-
BIT_FINAL_ARC
public static final int BIT_FINAL_ARC
The arc corresponds to the last character of a sequence available when building the automaton (acceptor transition).- See Also:
- Constant Field Values
-
LABEL_INDEX_SIZE
public static final int LABEL_INDEX_SIZE
Maximum size of the labels index.- See Also:
- Constant Field Values
-
arcs
public byte[] arcs
An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.
-
labelMapping
public final byte[] labelMapping
Label mapping for M-indexed labels.
-
-
Method Detail
-
getRootNode
public int getRootNode()
- Specified by:
getRootNode
in classFSA
- Returns:
- Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
-
getFirstArc
public final int getFirstArc(int node)
- Specified by:
getFirstArc
in classFSA
- Parameters:
node
- Identifier of the node.- Returns:
- Returns the identifier of the first arc leaving
node
or 0 if the node has no outgoing arcs.
-
getNextArc
public final int getNextArc(int arc)
- Specified by:
getNextArc
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Returns the identifier of the next arc after
arc
and leavingnode
. Zero is returned if no more arcs are available for the node.
-
getArc
public int getArc(int node, byte label)
-
getEndNode
public int getEndNode(int arc)
- Specified by:
getEndNode
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Return the end node pointed to by a given
arc
. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
-
getArcLabel
public byte getArcLabel(int arc)
- Specified by:
getArcLabel
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Return the label associated with a given
arc
.
-
getRightLanguageCount
public int getRightLanguageCount(int node)
- Overrides:
getRightLanguageCount
in classFSA
- Parameters:
node
- Identifier of the node.- Returns:
- Returns the number of sequences reachable from the given state if
the automaton was compiled with
FSAFlags.NUMBERS
. The size of the right language of the state, in other words.
-
isArcFinal
public boolean isArcFinal(int arc)
- Specified by:
isArcFinal
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Returns
true
if the destination node at the end of thisarc
corresponds to an input sequence created when building this automaton.
-
isArcTerminal
public boolean isArcTerminal(int arc)
- Specified by:
isArcTerminal
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Returns
true
if thisarc
does not have a terminating node (@linkFSA.getEndNode(int)
will throw an exception). ImpliesFSA.isArcFinal(int)
.
-
isArcLast
public boolean isArcLast(int arc)
Returnstrue
if this arc hasNEXT
bit set.- Parameters:
arc
- The node's arc identifier.- Returns:
- Returns true if the argument is the last arc of a node.
- See Also:
BIT_LAST_ARC
-
isNextSet
public boolean isNextSet(int arc)
- Parameters:
arc
- The node's arc identifier.- Returns:
- Returns true if
BIT_TARGET_NEXT
is set for this arc. - See Also:
BIT_TARGET_NEXT
-
-