Package org.languagetool.tagging
Class BaseTagger
- java.lang.Object
-
- org.languagetool.tagging.BaseTagger
-
-
Field Summary
Fields Modifier and Type Field Description protected Locale
conversionLocale
protected WordTagger
wordTagger
-
Constructor Summary
Constructors Constructor Description BaseTagger(String filename)
BaseTagger(String filename, Locale conversionLocale)
BaseTagger(String filename, Locale locale, boolean tagLowercaseWithUppercase)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected @Nullable List<AnalyzedToken>
additionalTags(String word, WordTagger wordTagger)
Allows additional tagging in some language-dependent circumstancesprotected AnalyzedToken
asAnalyzedToken(String word, morfologik.stemming.WordData wd)
protected List<AnalyzedToken>
asAnalyzedTokenList(String word, List<morfologik.stemming.WordData> wdList)
protected List<AnalyzedToken>
asAnalyzedTokenListForTaggedWords(String word, List<TaggedWord> taggedWords)
AnalyzedTokenReadings
createNullToken(String token, int startPos)
Create the AnalyzedToken used for whitespace and other non-words.AnalyzedToken
createToken(String token, String posTag)
Create a token specific to the language of the implementing class.protected List<AnalyzedToken>
getAnalyzedTokens(String word)
protected morfologik.stemming.Dictionary
getDictionary()
String
getDictionaryPath()
abstract @Nullable String
getManualAdditionsFileName()
Get the filename for manual additions, e.g.,/en/added.txt
, ornull
.@Nullable String
getManualRemovalsFileName()
Get the filename for manual removals, e.g.,/en/removed.txt
, ornull
.protected WordTagger
getWordTagger()
boolean
overwriteWithManualTagger()
If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.List<AnalyzedTokenReadings>
tag(List<String> sentenceTokens)
Returns a list ofAnalyzedToken
s that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).
-
-
-
Field Detail
-
wordTagger
protected final WordTagger wordTagger
-
conversionLocale
protected final Locale conversionLocale
-
-
Method Detail
-
getManualAdditionsFileName
@Nullable public abstract @Nullable String getManualAdditionsFileName()
Get the filename for manual additions, e.g.,/en/added.txt
, ornull
.- Since:
- 2.8
-
getManualRemovalsFileName
@Nullable public @Nullable String getManualRemovalsFileName()
Get the filename for manual removals, e.g.,/en/removed.txt
, ornull
.- Since:
- 3.2
-
getDictionaryPath
public String getDictionaryPath()
- Since:
- 2.9
-
overwriteWithManualTagger
public boolean overwriteWithManualTagger()
If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.- Since:
- 2.9
-
getWordTagger
protected WordTagger getWordTagger()
-
getDictionary
protected morfologik.stemming.Dictionary getDictionary()
-
tag
public List<AnalyzedTokenReadings> tag(List<String> sentenceTokens) throws IOException
Description copied from interface:Tagger
Returns a list ofAnalyzedToken
s that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).Note that this method takes exactly one sentence. Its implementation may implement special cases for the first word of a sentence, which is usually written with an uppercase letter.
- Specified by:
tag
in interfaceTagger
- Parameters:
sentenceTokens
- the text as returned by a WordTokenizer- Throws:
IOException
-
getAnalyzedTokens
protected List<AnalyzedToken> getAnalyzedTokens(String word)
-
asAnalyzedTokenList
protected List<AnalyzedToken> asAnalyzedTokenList(String word, List<morfologik.stemming.WordData> wdList)
-
asAnalyzedTokenListForTaggedWords
protected List<AnalyzedToken> asAnalyzedTokenListForTaggedWords(String word, List<TaggedWord> taggedWords)
-
asAnalyzedToken
protected AnalyzedToken asAnalyzedToken(String word, morfologik.stemming.WordData wd)
-
createNullToken
public final AnalyzedTokenReadings createNullToken(String token, int startPos)
Description copied from interface:Tagger
Create the AnalyzedToken used for whitespace and other non-words. Usenull
as the POS tag for this token.- Specified by:
createNullToken
in interfaceTagger
-
createToken
public AnalyzedToken createToken(String token, String posTag)
Description copied from interface:Tagger
Create a token specific to the language of the implementing class.- Specified by:
createToken
in interfaceTagger
-
additionalTags
@Nullable protected @Nullable List<AnalyzedToken> additionalTags(String word, WordTagger wordTagger)
Allows additional tagging in some language-dependent circumstances- Parameters:
word
- The word to tag- Returns:
- Returns list of analyzed tokens with additional tags, or
null
-
-