Content Video Browsing Based on Text Regions Extraction and Classification Using Convolutional Neural Network

Bassem Bouaziz,Walid Mahdi,Jihen Amara

doi:10.1109/aiccsa.2017.195

Abstract

Text within video frames carries important information for a visual content understanding, retrieval and browsing. In this paper, we propose a video-text region extraction and classification approach that proceeds in two main steps: text region extraction followed by text region classification. In the first step, we use an approach based on a split and merge process to detect at first the appearance of text regions then to localize and extract them. A filtering process that validates the effective regions is handled. For the classification, we propose a convolution neural network (CNN) to classify extracted text regions into semantic classes. Consequently, a visual table of content is generated based on extracted and classified text regions occurring within video sequence enriched by a semantic descriptors(ie. place name, player name, event, etc.). These text regions are then considered as visual indices that offer a nonlinear content video browsing. Experimentations conducted on a variety of video sequences show the efficiency of our approach.

Full Text