tesseract  4.1.1
tesseract::UnicodeSpanSkipper Class Reference

Public Member Functions

 UnicodeSpanSkipper (const UNICHARSET *unicharset, const WERD_CHOICE *word)
 
int SkipPunc (int pos)
 
int SkipDigits (int pos)
 
int SkipRomans (int pos)
 
int SkipAlpha (int pos)
 

Detailed Description

Definition at line 312 of file paragraphs.cpp.

Constructor & Destructor Documentation

◆ UnicodeSpanSkipper()

tesseract::UnicodeSpanSkipper::UnicodeSpanSkipper ( const UNICHARSET unicharset,
const WERD_CHOICE word 
)
inline

Definition at line 314 of file paragraphs.cpp.

Member Function Documentation

◆ SkipAlpha()

int tesseract::UnicodeSpanSkipper::SkipAlpha ( int  pos)

Definition at line 353 of file paragraphs.cpp.

353  : // middle dot
354  case 0x25A1: // white square
355  case 0x25A0: // black square
356  case 0x25AA: // black small square

◆ SkipDigits()

int tesseract::UnicodeSpanSkipper::SkipDigits ( int  pos)

Definition at line 337 of file paragraphs.cpp.

337  {
338  while (pos < wordlen_ && u_->get_isalpha(word_->unichar_id(pos))) pos++;
339  return pos;
340 }
341 

◆ SkipPunc()

int tesseract::UnicodeSpanSkipper::SkipPunc ( int  pos)

Definition at line 332 of file paragraphs.cpp.

◆ SkipRomans()

int tesseract::UnicodeSpanSkipper::SkipRomans ( int  pos)

Definition at line 343 of file paragraphs.cpp.

343  {
344  STRING single_ch;
345  single_ch += ch;
346  return LikelyListMark(single_ch);
347  }
348  switch (ch) {
349  // TODO(eger) expand this list of unicodes as needed.
350  case 0x00B0: // degree sign
351  case 0x2022: // bullet

The documentation for this class was generated from the following file:
WERD_CHOICE::unichar_id
UNICHAR_ID unichar_id(int index) const
Definition: ratngs.h:305
STRING
Definition: strngs.h:45