Abstract

The majority of the artificial captions in videos include semantic information related to the video. Most of the preceding studies on caption detection seeking to extract such semantic information relied on spatial information in still images and used temporal information in videos. By contrast, this study proposes a method of detecting the artificial caption region in videos using both temporal and spatial information simultaneously. This method broadly proceeds in two stages. Firstly, an improved text appearance map is generated to detect the caption candidate region, and the continuous candidate region is detected through the process of candidate region matching. Secondly, a disappearance test is conducted on the detected continuous candidate region to determine whether the caption disappears, and if the caption disappears, the caption candidate region is determined through a merging process based on temporal and spatial information. The experiment is conducted to demonstrate the efficiency of the proposed method for region detection in videos that include captions in a variety of sizes, formats and positions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.