Abstract

Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms.

Highlights

  • Printed text is defined by strong shape regularity

  • The evaluation test framework for the text parameter extraction algorithm consists of a few text experiments. They are divided into two distinct groups: 1. Text line segmentation experiments, 2

  • The paper describes the proposal of a basic test framework for the evaluation of text feature extraction algorithms

Read more

Summary

Introduction

Printed text is defined by strong shape regularity. Its text lines have similar orientation and its skewness is similar or equal, text orientation on same page is not variable. Most text line segmentation methods are based on the assumptions that the distance between neighboring text lines is significant and that text lines are reasonably straight These assumptions are not always valid for handwritten documents. Text line segmentation is a leading challenge in document image analysis [3] Upon completion of this process, the primary goal of OCR is the extraction of text parameters from optically scanned documents, so reference text line and skew rate identification is mandatory. The establishment of the test framework for the evaluation of the document image processing algorithms is of great importance This is precisely the task of this paper, and a basic method framework for the evaluation of the text line segmentation and text parameters extraction is proposed.

Evaluation Test Framework
Document text image
Multi-line text segmentation experiment
Multi-line waved text segmentation experiment
Multi-line fractured text segmentation experiment
Skew rate text experiment
Handwritten curved text experiment
Handwritten fractured text experiment
Decision Making
Combined test results
Test Example and Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call