Abstract

Text within video frames carries important information for a visual content understanding, retrieval and browsing. In this paper, we propose a video-text region extraction and classification approach that proceeds in two main steps: text region extraction followed by text region classification. In the first step, we use an approach based on a split and merge process to detect at first the appearance of text regions then to localize and extract them. A filtering process that validates the effective regions is handled. For the classification, we propose a convolution neural network (CNN) to classify extracted text regions into semantic classes. Consequently, a visual table of content is generated based on extracted and classified text regions occurring within video sequence enriched by a semantic descriptors(ie. place name, player name, event, etc.). These text regions are then considered as visual indices that offer a nonlinear content video browsing. Experimentations conducted on a variety of video sequences show the efficiency of our approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.