Abstract

Extraction of textual information from natural scene images is a challenging task due to imaging conditions and diversity of text properties. Segmentation of scene text is important step in the pipeline that significantly affects the final recognition performance. In this paper I propose a new scene text segmentation method. Firstly, a novel approach for character candidates generation based on extremal regions (ERs) is introduced. Subpaths having low area variation are extracted from ER tree. Instead of using minimum variation criterion for selection of character candidates, position of ER in extracted subpath is used as criterion for that purpose. Each subpath is represented by one ER that is sent to SVM-based classification step. After that a novel method for character candidates grouping is used to discard non-character objects that are wrongly classified as characters. Proposed approach estimates vertical positions of the lines by sorting y coordinates of region centroids and checks spatial relation of adjacent regions in the line. This step enhances precision significantly and has lower computational complexity compared to hierarchical clustering methods. Finally, the last step is restoration of character ERs erroneously eliminated by SVM classifier where text layout properties are exploited to correct false negative classifications. Experimental results obtained on the ICDAR 2013 dataset show that the proposed character candidates generation method efficiently prunes repeating regions and achieves character recall rate superior to recently published ER based method. Proposed segmentation algorithm obtains competitive performance compared to state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.