Abstract

With the development of video editing technology, there are growing uses of overlay text inserted into video contents to provide viewers with better visual understanding. Since the content of the scene or the editor`s intention can be well represented by using inserted text, it is useful for video information retrieval and indexing. Most of the previous approaches are based on low-level features, such as edge, color, and texture information. However, existing methods experience difficulties in handling texts with various contrasts or inserted in a complex background. In this paper, we propose a novel framework to localize the overlay text in a video scene. Based on our observation that there exist transient colors between inserted text and its adjacent background a transition map is generated. Then candidate regions are extracted by using the transition map and overlay text is finally determined based on the density of state in each candidate. The proposed method is robust to color, size, position, style, and contrast of overlay text. It is also language free. Text region update between frames is also exploited to reduce the processing time. Experiments are performed on diverse videos to confirm the efficiency of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call