Class MultiWordChunker2
- java.lang.Object
-
- org.languagetool.tagging.disambiguation.AbstractDisambiguator
-
- org.languagetool.tagging.disambiguation.MultiWordChunker2
-
- All Implemented Interfaces:
Disambiguator
public class MultiWordChunker2 extends AbstractDisambiguator
Multiword tagger-chunker. Note: currently does not support:- overlapping tagging (first matching multiword entry wins)
- Author:
- Andriy Rysin
-
-
Constructor Summary
Constructors Constructor Description MultiWordChunker2(String filename)
MultiWordChunker2(String filename, boolean allowFirstCapitalized)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description AnalyzedSentence
disambiguate(AnalyzedSentence input)
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.protected String
formatPosTag(String posTag, int position, int multiwordLength)
Override this method if you want format POS tag differentlyprotected boolean
matches(String matchText, AnalyzedTokenReadings inputTokens)
protected AnalyzedTokenReadings
prepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag)
void
setRemoveOtherReadings(boolean removeOtherReadings)
void
setWrapTag(boolean wrapTag)
-
Methods inherited from class org.languagetool.tagging.disambiguation.AbstractDisambiguator
preDisambiguate
-
-
-
-
Constructor Detail
-
MultiWordChunker2
public MultiWordChunker2(String filename)
- Parameters:
filename
- file text with multiwords and tags
-
MultiWordChunker2
public MultiWordChunker2(String filename, boolean allowFirstCapitalized)
- Parameters:
filename
- file text with multiwords and tagsallowFirstCapitalized
- if set totrue
, first word of the multiword can be capitalized
-
-
Method Detail
-
setRemoveOtherReadings
public void setRemoveOtherReadings(boolean removeOtherReadings)
- Parameters:
removeOtherReadings
- If true and multiword matches other readings will be removed
-
setWrapTag
public void setWrapTag(boolean wrapTag)
- Parameters:
wrapTag
- If true the tag will be wrapped with < and >
-
formatPosTag
protected String formatPosTag(String posTag, int position, int multiwordLength)
Override this method if you want format POS tag differently- Parameters:
posTag
- POS tag for the multiwordposition
- Position of the token in the multiword- Returns:
- Returns formatted POS tag for the multiword
-
disambiguate
public AnalyzedSentence disambiguate(AnalyzedSentence input)
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.- Parameters:
input
- The tokens to be chunked.- Returns:
- AnalyzedSentence with additional markers.
-
matches
protected boolean matches(String matchText, AnalyzedTokenReadings inputTokens)
-
prepareNewReading
protected AnalyzedTokenReadings prepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag)
-
-