A novel approach to detecting scene text in video

Xiaodong Huang

doi:10.1109/cisp.2011.6099945

Abstract

Scene text in video contains semantic information and thus can contribute significantly to video retrieval and understanding. However, most methods detect scene text in image or single frame. Videos differ from images in temporal redundancy. In this paper we present a novel approach to detecting video scene text based on the video temporal redundancy. Video scene texts in consecutive frames have arbitrary motion due to camera or object movement. Therefore, first, we perform the motion detection in 30 consecutive frames to synthesize motion image. Second we implement video scene text detection in single frame to retrieve candidate text regions. Finally, the synthesized motion image is used to filter out candidate text regions and only keep the candidate text regions which have motion occurrence as final scene text. Experimental results show that our algorithm performs well for detecting scene text with various color, font-size and text alignment.

Full Text