Abstract

Text segmentation is important for text image analysis and recognition; however, it is challenging due to noise and complex background in natural scenes. Superpixel-based image representation can enhance robustness to noise and local disturbances, but conventional superpixel algorithms are difficult to obtain the complete stroke regions and accurate boundaries for text images. In this study, a text segmentation method based on superpixel clustering is proposed. First, to generate accurate superpixels for text images, an adaptive simple linear iterative clustering-based text superpixel generation algorithm is proposed. The adaptive superpixel size and compactness are calculated to enhance boundary adherence. Second, to increase the complete coverage of strokes from superpixels, superpixel clustering merges homogeneous superpixels into larger regions for both strokes and the background. A modified density-based spatial clustering of applications with noise is proposed. Finally, stroke superpixel verification assigns each region to a stroke or to the background and the text segmentation result is obtained. The proposed method shows promising robustness to noise and complex background textures. Experimental results on the Korea Advanced Institute of Science and Technology (KAIST) scene text dataset, International Conference on Document Analysis and Recognition (ICDAR) 2003 natural scene text image dataset and Street View Text dataset verify that this method is effective and significantly outperforms existing methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.