Three-dimensional (3D) text appearing in natural scene images is common due to 3D cameras and the capture of text from different angles, which presents new problems for text detection. This is because of the presence of depth information, shadows, and decorative characters in the images. In this work, we consider those images where 3D text appears with depth, as well as shadow information for text detection. We propose a novel method based on local resultant gradient vector difference (LRGVD), inpainting and a deep learning model for detecting 3D as well as two-dimensional (2D) texts in natural scene images. The boundary of components that are invariant to the above challenges is detected by exploring LRGVD. The LRGVD uses gradient magnitude and direction in a novel way for detecting the boundary of the components. Further, we propose an inpainting method in a new way for restoring the character background information using boundaries. For a given region and the input image, the inpainting method divides the whole image into planes and then propagates the values in the planes into the missing region based on posterior probabilities and neighboring information. This results in text regions with false positives. Then, the differential binarization network (DB-Net) is proposed for detecting text irrespective of orientation, background, 3D or 2D, etc. Experiments conducted on our 3D text images and standard datasets of natural scene text images, namely ICDAR 2019 MLT, ICDAR 2019 ArT, DAST1500, Total-Text and SCUT-CTW1500, show that the proposed method is effective in detecting 3D and 2D texts in the images.
Read full abstract