Abstract

This paper proposes a novel approach to extracting text lines from curved document images that are captured from an opened thick and bounded book or a curled document sheet. We first extract the connected components (CCs) in a binary image and then remove the non-textual CCs. Additionally, we estimate the orientation of each CC through local projections and a feature vector is accordingly defined to describe each CC. Furthermore, a hybrid metric is designed based on the distances between CCs and the corresponding minimum spanning tree which can well exploit the overall structure of the curved text lines is constructed. A tree pruning strategy is finally proposed to cluster the CCs into separated text lines. Experimental results on a wide variety of curved document images demonstrate the effectiveness and efficiency of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call