Abstract

Embedded text in video frames represents an interesting source of high level information especially for semantic indexing and retrieval applications. In this paper, we propose a novel approach for Arabic text detection in news videos using a deep learning method called RetinaNet. It is based in two steps. The first one aims to extract features using residual network (Res-Net) and a pyramidal feature network (FPN). In the second step, we use two fully convolutional networks (FCN), one is for the classification task and the other for the bounding box regression task. Experiments show that the proposed method can detect text regions effectively and provides a more satisfactory results compared to other existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call