Abstract

Delayed strokes, such as i-dots and t-crosses, cause a challenge in online handwriting recognition by introducing an extra source of variation in the sequence order of the handwritten input. The problem is especially relevant for languages where delayed strokes are abundant and training data are limited. Studies for handling delayed strokes have mainly focused on Arabic and Farsi scripts where the problem is most severe, with less attention devoted for scripts based on the Latin alphabet. This study aims to investigate the effectiveness of the delayed stroke handling methods proposed in the literature. Evaluated methods include the removal of delayed strokes and embedding delayed strokes in the correct writing order, together with their variations. Starting with new definitions of a delayed stroke, we tested each method using both hidden Markov model classifiers separately for English and Turkish and bidirectional long short-term memory networks for English. For both the UNIPEN and Turkish datasets, the best results are obtained with hidden Markov model recognizers by removing all delayed strokes, with up to 2.13% and 2.03% points accuracy increases over the respective baselines. In case of the bidirectional long short-term memory networks, stroke order correction of the delayed strokes by embedding performs the best, with 1.81% (raw) and 1.72% (post-processed) points improvements above the baseline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call