Package org.languagetool.languagemodel
Class LuceneSingleIndexLanguageModel
- java.lang.Object
-
- org.languagetool.languagemodel.BaseLanguageModel
-
- org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
-
- All Implemented Interfaces:
AutoCloseable
,LanguageModel
public class LuceneSingleIndexLanguageModel extends BaseLanguageModel
Information about ngram occurrences, taken from Lucene indexes (one index per ngram level). This is not a real language model as it only returns information about occurrence counts but has no probability calculation, especially not for the case with 0 occurrences.- Since:
- 3.2
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
LuceneSingleIndexLanguageModel.LuceneSearcher
-
Field Summary
-
Fields inherited from interface org.languagetool.languagemodel.LanguageModel
GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START
-
-
Constructor Summary
Constructors Constructor Description LuceneSingleIndexLanguageModel(int maxNgram)
LuceneSingleIndexLanguageModel(File topIndexDir)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static void
clearCaches()
Only used internally.void
close()
protected void
doValidateDirectory(File topIndexDir)
long
getCount(String token1)
Get the occurrence count fortoken
.long
getCount(List<String> tokens)
Get the occurrence count for the given token sequence.protected LuceneSingleIndexLanguageModel.LuceneSearcher
getLuceneSearcher(int ngramSize)
long
getTotalTokenCount()
String
toString()
static void
validateDirectory(File topIndexDir)
Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories1grams
etc.-
Methods inherited from class org.languagetool.languagemodel.BaseLanguageModel
getPseudoProbability, getPseudoProbabilityStupidBackoff
-
-
-
-
Constructor Detail
-
LuceneSingleIndexLanguageModel
public LuceneSingleIndexLanguageModel(File topIndexDir)
- Parameters:
topIndexDir
- a directory which contains at least another sub directory called3grams
, which is a Lucene index with ngram occurrences as created byorg.languagetool.dev.FrequencyIndexCreator
.
-
LuceneSingleIndexLanguageModel
@Experimental public LuceneSingleIndexLanguageModel(int maxNgram)
-
-
Method Detail
-
validateDirectory
public static void validateDirectory(File topIndexDir)
Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories1grams
etc.- Since:
- 3.0
-
clearCaches
@Experimental public static void clearCaches()
Only used internally.- Since:
- 3.2
-
doValidateDirectory
protected void doValidateDirectory(File topIndexDir)
-
getCount
public long getCount(List<String> tokens)
Description copied from class:BaseLanguageModel
Get the occurrence count for the given token sequence.- Specified by:
getCount
in classBaseLanguageModel
-
getCount
public long getCount(String token1)
Description copied from class:BaseLanguageModel
Get the occurrence count fortoken
.- Specified by:
getCount
in classBaseLanguageModel
-
getTotalTokenCount
public long getTotalTokenCount()
- Specified by:
getTotalTokenCount
in classBaseLanguageModel
-
getLuceneSearcher
protected LuceneSingleIndexLanguageModel.LuceneSearcher getLuceneSearcher(int ngramSize)
-
close
public void close()
-
-