Scene Text In Video Research Articles

Recently, video scene text detection has received increasing attention due to its comprehensive applications. However, the lack of annotated scene text video datasets has become one of the most important problems, which hinders the development of video scene text detection. The existing scene text video datasets are not large-scale due to the expensive cost caused by manual labeling. In addition, the text instances in these datasets are too clear to be a challenge. To address the above issues, we propose a tracking based semi-automatic labeling strategy for scene text videos in this paper. We get semi-automatic scene text annotation by labeling manually for the first frame and tracking automatically for the subsequent frames, which avoid the huge cost of manual labeling. Moreover, a paired low-quality scene text video dataset named Text-RBL is proposed, consisting of raw videos, blurry videos, and low-resolution videos, labeled by the proposed convenient semi-automatic labeling strategy. Through an averaging operation and bicubic down-sampling operation over the raw videos, we can efficiently obtain blurry videos and low-resolution videos paired with raw videos separately. To verify the effectiveness of Text-RBL, we propose a baseline model combined with the text detector and tracker for video scene text detection. Moreover, a failure detection scheme is designed to alleviate the baseline model drift issue caused by complex scenes. Extensive experiments demonstrate that Text-RBL with paired low-quality videos labeled by the semi-automatic method can significantly improve the performance of the text detector in low-quality scenes.

Read full abstract

Video scene text contains valuable information for scene understanding, as scene text in video provides important semantic clues for human beings to sense the environment. Text detection in natural scene is challenging due to low resolution/low contrast, cluttered backgrounds and various illumination changes. Therefore, in this paper, a new approach has been proposed to detect video scene text based on saliency edge map, which combines both saliency map and edge features for scene text detection. The saliency map is conducive to detecting the text with cluttered backgrounds whereas the edge map is suitable for detecting the scene text with low resolution and various illumination changes. First of all, we retrieve the saliency map and edge map on the video frame/image, respectively. The saliency map can keep most of saliency regions in the video frame/image which will remove some complicated background. The edge map retrieves the edge feature which is not sensitive to the illumination changes and low resolution/low contrast regions. Then we integrate the edge map and saliency map into saliency edge map (SEM), which preserves the advantages of saliency map and edge maps. Finally, based on Gaussian mixture model (GMM), the SEM can be divided into three kinds of components: bright characters, dark characters and background, and we perform connected component analysis on these three components to get the text regions. Experimental evaluations based on public dataset, such as ICDAR 2003, 2013, MSRA-TD500 and SVT, and news video dataset demonstrate that our method significantly outperforms the other 4 text detection algorithms in terms of recall, precision, F-Score and detection speed, especially when there are challenges such as text with different alignments, character sizes, languages, appearances and uneven illumination.

Read full abstract

Scene Text In Video Research Articles

Related Topics

Articles published on Scene Text In Video

Tracking Based Semi-Automatic Annotation for Scene Text Videos

Automatic video scene text detection based on saliency edge map

Bayesian super-resolution of text in videowith a text-specific bimodal prior

MPEG-7 Videotext description scheme for superimposed text in images and video

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Scene Text In Video Research Articles

Related Topics

Articles published on Scene Text In Video

Tracking Based Semi-Automatic Annotation for Scene Text Videos

Automatic video scene text detection based on saliency edge map

Bayesian super-resolution of text in videowith a text-specific bimodal prior

MPEG-7 Videotext description scheme for superimposed text in images and video