Robustly Extracting Captions in Videos Based on Stroke-Like Edges and Spatio-Temporal Analysis

Xiaoqian Liu,Weiqiang Wang

doi:10.1109/tmm.2011.2177646

Abstract

This paper presents an effective and efficient approach to extracting captions from videos. The robustness of our system comes from two aspects of contributions. First, we propose a novel stroke-like edge detection method based on contours, which can effectively remove the interference of non-stroke edges in complex background so as to make the detection and localization of captions much more accurate. Second, our approach highlights the importance of temporal feature, i.e., inter-frame feature, in the task of caption extraction (detection, localization, segmentation). Instead of regarding each video frame as an independent image, through fully utilizing the temporal feature of video together with spatial analysis in the computation of caption localization, segmentation and post-processing, we demonstrate that the use of inter-frame information can effectively improve the accuracy of caption localization and caption segmentation. In the comprehensive our evaluation experiments, the experimental results on two representative datasets have shown the robustness and efficiency of our approach.

Full Text