An efficient hybrid scheme for key frame extraction and text localization in video

Monika Singh,Amanpreet Kaur

doi:10.1109/icacci.2015.7275784

Monika Singh, Amanpreet Kaur

https://doi.org/10.1109/icacci.2015.7275784

Copy DOI

Export

Save

Cite

Publication Date: Aug 1, 2015

Citations: 3

Affiliation: The NorthCap University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Efficient algorithms for caption text and scene text detection in video sequences are highly in-demand in the area of multimedia indexing and data retrieval. Due to challenges like, low resolution, low contrast, complex background and texts with multiple orientation/style/color/alignment, scene text extraction from video images is undoubtedly more challenging task. In this paper, a method has been proposed to efficiently extract the key frames from the videos based on color moments and then text localization is done only on the key frames. Since the text information does not change with each frame, text extraction is performed only on key frames which help in reducing the computational/processing time of the algorithm. Further, this paper proposes a hybrid robust method to localize scene and graphic text in the video frames using 2-D haar discrete wavelet transform (DWT), Laplacian of Gaussian filter and maximum gradient difference method. DWT provides a fast decomposition of the images into an approximate and three detail components. The three detail components contain the information about the vertical, horizontal and diagonal edges of the image which are used to easily differentiate texts from image. Maximum gradient difference method is used to further refine the text localization process and the gradient difference magnitude is used in the thresholding process. A dynamic thresholding technique has been used to convert the images into binary form. Since this thresholding technique obtains different threshold values for different images, it can be used for automatic text localization in video sequences. Two mask operators has been employed to obtain an equation which when applied on each pixel provides the intended threshold value. False positives are eliminated using morphological operations and connected component analysis is done to finally localize the text. The comparison metrics in the results show that the proposed method gives a good performance of detection rate, false alarm rate and misdetection rate.

Full Text