Package morfologik.fsa
Class FSA
- java.lang.Object
-
- morfologik.fsa.FSA
-
- All Implemented Interfaces:
Iterable<ByteBuffer>
public abstract class FSA extends Object implements Iterable<ByteBuffer>
This is a top abstract class for handling finite state automata. These automata are arc-based, a design described in Jan Daciuk's Incremental Construction of Finite-State Automata and Transducers, and Their Use in the Natural Language Processing (PhD thesis, Technical University of Gdansk).
-
-
Constructor Summary
Constructors Constructor Description FSA()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract int
getArc(int node, byte label)
int
getArcCount(int node)
abstract byte
getArcLabel(int arc)
abstract int
getEndNode(int arc)
abstract int
getFirstArc(int node)
abstract Set<FSAFlags>
getFlags()
abstract int
getNextArc(int arc)
int
getRightLanguageCount(int node)
abstract int
getRootNode()
Iterable<ByteBuffer>
getSequences()
Iterable<ByteBuffer>
getSequences(int node)
Returns an iterator over all binary sequences starting at the given FSA state (node) and ending in final nodes.abstract boolean
isArcFinal(int arc)
abstract boolean
isArcTerminal(int arc)
Iterator<ByteBuffer>
iterator()
Returns an iterator over all binary sequences starting from the initial FSA state (node) and ending in final nodes.static FSA
read(InputStream stream)
A factory for reading automata in any of the supported versions.static <T extends FSA>
Tread(InputStream stream, Class<? extends T> clazz)
A factory for reading a specific FSA subclass, including proper casting.protected static byte[]
readRemaining(InputStream in)
<T extends StateVisitor>
TvisitAllStates(T v)
Visit all states.<T extends StateVisitor>
TvisitInPostOrder(T v)
Same asvisitInPostOrder(StateVisitor, int)
, starting from root automaton node.<T extends StateVisitor>
TvisitInPostOrder(T v, int node)
Visits all states reachable fromnode
in postorder.<T extends StateVisitor>
TvisitInPreOrder(T v)
Same asvisitInPreOrder(StateVisitor, int)
, starting from root automaton node.<T extends StateVisitor>
TvisitInPreOrder(T v, int node)
Visits all states in preorder.
-
-
-
Method Detail
-
getRootNode
public abstract int getRootNode()
- Returns:
- Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
-
getFirstArc
public abstract int getFirstArc(int node)
- Parameters:
node
- Identifier of the node.- Returns:
- Returns the identifier of the first arc leaving
node
or 0 if the node has no outgoing arcs.
-
getNextArc
public abstract int getNextArc(int arc)
- Parameters:
arc
- The arc's identifier.- Returns:
- Returns the identifier of the next arc after
arc
and leavingnode
. Zero is returned if no more arcs are available for the node.
-
getArc
public abstract int getArc(int node, byte label)
- Parameters:
node
- Identifier of the node.label
- The arc's label.- Returns:
- Returns the identifier of an arc leaving
node
and labeled withlabel
. An identifier equal to 0 means the node has no outgoing arc labeledlabel
.
-
getArcLabel
public abstract byte getArcLabel(int arc)
- Parameters:
arc
- The arc's identifier.- Returns:
- Return the label associated with a given
arc
.
-
isArcFinal
public abstract boolean isArcFinal(int arc)
- Parameters:
arc
- The arc's identifier.- Returns:
- Returns
true
if the destination node at the end of thisarc
corresponds to an input sequence created when building this automaton.
-
isArcTerminal
public abstract boolean isArcTerminal(int arc)
- Parameters:
arc
- The arc's identifier.- Returns:
- Returns
true
if thisarc
does not have a terminating node (@linkgetEndNode(int)
will throw an exception). ImpliesisArcFinal(int)
.
-
getEndNode
public abstract int getEndNode(int arc)
- Parameters:
arc
- The arc's identifier.- Returns:
- Return the end node pointed to by a given
arc
. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
-
getFlags
public abstract Set<FSAFlags> getFlags()
- Returns:
- Returns a set of flags for this FSA instance.
-
getArcCount
public int getArcCount(int node)
- Parameters:
node
- Identifier of the node.- Returns:
- Calculates and returns the number of arcs of a given node.
-
getRightLanguageCount
public int getRightLanguageCount(int node)
- Parameters:
node
- Identifier of the node.- Returns:
- Returns the number of sequences reachable from the given state if
the automaton was compiled with
FSAFlags.NUMBERS
. The size of the right language of the state, in other words. - Throws:
UnsupportedOperationException
- If the automaton was not compiled withFSAFlags.NUMBERS
. The value can then be computed by manual count ofgetSequences(int)
.
-
getSequences
public Iterable<ByteBuffer> getSequences(int node)
Returns an iterator over all binary sequences starting at the given FSA state (node) and ending in final nodes. This corresponds to a set of suffixes of a given prefix from all sequences stored in the automaton.The returned iterator is a
ByteBuffer
whose contents changes on each call toIterator.next()
. The keep the contents between calls toIterator.next()
, one must copy the buffer to some other location.Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.
- Parameters:
node
- Identifier of the starting node from which to return subsequences.- Returns:
- An iterable over all sequences encoded starting at the given node.
-
getSequences
public final Iterable<ByteBuffer> getSequences()
- Returns:
- Returns all sequences encoded in the automaton.
-
iterator
public final Iterator<ByteBuffer> iterator()
Returns an iterator over all binary sequences starting from the initial FSA state (node) and ending in final nodes. The returned iterator is aByteBuffer
whose contents changes on each call toIterator.next()
. The keep the contents between calls toIterator.next()
, one must copy the buffer to some other location.Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.
- Specified by:
iterator
in interfaceIterable<ByteBuffer>
-
visitAllStates
public <T extends StateVisitor> T visitAllStates(T v)
Visit all states. The order of visiting is undefined. This method may be faster than traversing the automaton in post or preorder since it can scan states linearly. Returning false fromStateVisitor.accept(int)
immediately terminates the traversal.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPostOrder
public <T extends StateVisitor> T visitInPostOrder(T v)
Same asvisitInPostOrder(StateVisitor, int)
, starting from root automaton node.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPostOrder
public <T extends StateVisitor> T visitInPostOrder(T v, int node)
Visits all states reachable fromnode
in postorder. Returning false fromStateVisitor.accept(int)
immediately terminates the traversal.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.node
- Identifier of the node.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPreOrder
public <T extends StateVisitor> T visitInPreOrder(T v)
Same asvisitInPreOrder(StateVisitor, int)
, starting from root automaton node.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPreOrder
public <T extends StateVisitor> T visitInPreOrder(T v, int node)
Visits all states in preorder. Returning false fromStateVisitor.accept(int)
skips traversal of all sub-states of a given state.- Type Parameters:
T
- A subclass ofStateVisitor
.- Parameters:
v
- Visitor to receive traversal calls.node
- Identifier of the node.- Returns:
- Returns the argument (for access to anonymous class fields).
-
readRemaining
protected static final byte[] readRemaining(InputStream in) throws IOException
- Parameters:
in
- The input stream.- Returns:
- Reads all remaining bytes from an input stream and returns them as a byte array.
- Throws:
IOException
- Rethrown if an I/O exception occurs.
-
read
public static FSA read(InputStream stream) throws IOException
A factory for reading automata in any of the supported versions.- Parameters:
stream
- The input stream to read automaton data from. The stream is not closed.- Returns:
- Returns an instantiated automaton. Never null.
- Throws:
IOException
- If the input stream does not represent an automaton or is otherwise invalid.
-
read
public static <T extends FSA> T read(InputStream stream, Class<? extends T> clazz) throws IOException
A factory for reading a specific FSA subclass, including proper casting.- Type Parameters:
T
- A subclass ofFSA
to cast the read automaton to.- Parameters:
stream
- The input stream to read automaton data from. The stream is not closed.clazz
- A subclass ofFSA
to cast the read automaton to.- Returns:
- Returns an instantiated automaton. Never null.
- Throws:
IOException
- If the input stream does not represent an automaton, is otherwise invalid or the class of the automaton read from the input stream is not assignable toclazz
.
-
-