A new video text extraction using local laplacian filters and mean shift

Xiaodong Huang

doi:10.1007/s11042-018-6451-1

Abstract

Video text constitutes the semantic context of the video. For that reason, robust extraction of text is essential for successful video understanding, search and retrieval. Extracting text from background is an important phrase before the text can be recognized correctly. It is a challenging task because of the difficulties in text segmentation from the varied and complicated backgrounds. Therefore, this paper proposes a novel text extraction method to tackle this issue. First, we perform background complexity determination to distinguish the text lines with clear and simple background from those with complex background, which will increase the extraction speed. Then, for the text lines with complicated background and low contrast, we utilize the Local Laplacian Filters Commun ACM 58(3):81–91 [18] to enhance the details of text regions and get the Integrated Enhanced Map (IEM). Finally, we perform the Mean Shift IEEE Trans Pattern Anal Mach Intell 24(5):603–619 [4] for the segmentation on IEM and retrieve the text extraction results. Experimental evaluations based on a variety of videos dataset we collected demonstrate that our method significantly outperforms the other three video text extraction algorithms in terms of recall, precision and F-score, especially when there are challenges such as video text with different font sizes, font styles, languages, and background complexities.

Full Text