Package com.hankcs.algorithm
Class AhoCorasickDoubleArrayTrie<V>
- java.lang.Object
-
- com.hankcs.algorithm.AhoCorasickDoubleArrayTrie<V>
-
- All Implemented Interfaces:
Serializable
public class AhoCorasickDoubleArrayTrie<V> extends Object implements Serializable
An implementation of Aho Corasick algorithm based on Double Array Trie- Author:
- hankcs
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
AhoCorasickDoubleArrayTrie.Hit<V>
A result outputstatic interface
AhoCorasickDoubleArrayTrie.IHit<V>
Processor handles the output when hit a keywordstatic interface
AhoCorasickDoubleArrayTrie.IHitCancellable<V>
Callback that allows to cancel the search process.static interface
AhoCorasickDoubleArrayTrie.IHitFull<V>
Processor handles the output when hit a keyword, with more detail
-
Field Summary
Fields Modifier and Type Field Description protected int[]
base
base array of the Double Array Trie structureprotected int[]
check
check array of the Double Array Trie structureprotected int[]
fail
fail table of the Aho Corasick automataprotected int[]
l
the length of every keyprotected int[][]
output
output table of the Aho Corasick automataprotected int
size
the size of base and check arrayprotected V[]
v
outer value array
-
Constructor Summary
Constructors Constructor Description AhoCorasickDoubleArrayTrie()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
build(Map<String,V> map)
Build a AhoCorasickDoubleArrayTrie from a mapint
exactMatchSearch(String key)
match exactly by a keyAhoCorasickDoubleArrayTrie.Hit<V>
findFirst(String text)
Search first match in stringV
get(int index)
Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameterV
get(String key)
Get value by a String key, just like a map.get() methodvoid
load(ObjectInputStream in)
Loadboolean
matches(String text)
Checks that string contains at least one substringvoid
parseText(char[] text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
Parse textvoid
parseText(char[] text, AhoCorasickDoubleArrayTrie.IHitFull<V> processor)
Parse textList<AhoCorasickDoubleArrayTrie.Hit<V>>
parseText(CharSequence text)
Parse textvoid
parseText(CharSequence text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
Parse textvoid
parseText(CharSequence text, AhoCorasickDoubleArrayTrie.IHitCancellable<V> processor)
Parse textvoid
save(ObjectOutputStream out)
Saveboolean
set(String key, V value)
Update a value corresponding to a keyint
size()
Get the size of the keywordsprotected int
transition(int current, char c)
transition of a stateprotected int
transitionWithRoot(int nodePos, char c)
transition of a state, if the state is root and it failed, then returns the root
-
-
-
Field Detail
-
check
protected int[] check
check array of the Double Array Trie structure
-
base
protected int[] base
base array of the Double Array Trie structure
-
fail
protected int[] fail
fail table of the Aho Corasick automata
-
output
protected int[][] output
output table of the Aho Corasick automata
-
v
protected V[] v
outer value array
-
l
protected int[] l
the length of every key
-
size
protected int size
the size of base and check array
-
-
Method Detail
-
parseText
public List<AhoCorasickDoubleArrayTrie.Hit<V>> parseText(CharSequence text)
Parse text- Parameters:
text
- The text- Returns:
- a list of outputs
-
parseText
public void parseText(CharSequence text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
parseText
public void parseText(CharSequence text, AhoCorasickDoubleArrayTrie.IHitCancellable<V> processor)
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
parseText
public void parseText(char[] text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
parseText
public void parseText(char[] text, AhoCorasickDoubleArrayTrie.IHitFull<V> processor)
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
matches
public boolean matches(String text)
Checks that string contains at least one substring- Parameters:
text
- source text to check- Returns:
true
if string contains at least one substring
-
findFirst
public AhoCorasickDoubleArrayTrie.Hit<V> findFirst(String text)
Search first match in string- Parameters:
text
- source text to check- Returns:
- first match or
null
if there are no matches
-
save
public void save(ObjectOutputStream out) throws IOException
Save- Parameters:
out
- An ObjectOutputStream object- Throws:
IOException
- Some IOException
-
load
public void load(ObjectInputStream in) throws IOException, ClassNotFoundException
Load- Parameters:
in
- An ObjectInputStream object- Throws:
IOException
ClassNotFoundException
-
get
public V get(String key)
Get value by a String key, just like a map.get() method- Parameters:
key
- The key- Returns:
-
set
public boolean set(String key, V value)
Update a value corresponding to a key- Parameters:
key
- the keyvalue
- the value- Returns:
- successful or not(failure if there is no key)
-
get
public V get(int index)
Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameter- Parameters:
index
- The index- Returns:
- The value
-
transition
protected int transition(int current, char c)
transition of a state- Parameters:
current
-c
-- Returns:
-
transitionWithRoot
protected int transitionWithRoot(int nodePos, char c)
transition of a state, if the state is root and it failed, then returns the root- Parameters:
nodePos
-c
-- Returns:
-
build
public void build(Map<String,V> map)
Build a AhoCorasickDoubleArrayTrie from a map- Parameters:
map
- a map containing key-value pairs
-
exactMatchSearch
public int exactMatchSearch(String key)
match exactly by a key- Parameters:
key
- the key- Returns:
- the index of the key, you can use it as a perfect hash function
-
size
public int size()
Get the size of the keywords- Returns:
-
-