Abstract

Extracting candidate text connected components (CCs) is critical for CC-based text localization. Based on the observation that text strokes in born-digital images mostly have complete contours and the text pixels have high contrast with the adjacent non-text pixels, we propose a method to extract candidate text CCs by combining text contours and stroke interior regions. After segmenting the image into non-smooth and smooth regions based on local contrast, text contour pixels in non-smooth regions are detached from adjacent non-text pixels by local binarization. Then, obvious non-text contours can be removed according to the spatial relationship of text and non-text contours. While smooth regions include stroke interior regions and non-text smooth regions, some non-text smooth regions can be easily removed because they are not surrounded by candidate text contours. At last, candidate text contours and stroke interior regions are combined to generate candidate text CCs. The CCs undergo CC filtering, text line grouping and line classification to give the text localization result. Experimental results on the born-digital dataset of ICDAR2013 robust reading competition demonstrate the efficiency and superiority of the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call