Abstract

The text presented in videos contains important information for content analysis, indexing, and retrieval of videos. The key technique for extracting this information is to find, verify, and recognize video text in various languages and fonts against complex backgrounds. In this paper, we propose a novel method that combines a corner response feature map and transferred deep convolutional neural networks for detecting and recognizing video text. First, we use a corner response feature map to detect candidate text regions with a high recall. Next, we partition the candidate text regions into candidate text lines by projection analysis using two alternative methods. We then construct classification networks transferred from VGG16, ResNet50, and InceptionV3 to eliminate false positives. Finally, we develop a novel fuzzy c-means clustering-based separation algorithm to obtain a clean text layer from complex backgrounds so that the text is correctly recognized by commercial optical character recognition software. The proposed method is robust and has good performance on video text detection and recognition, which was evaluated on three publicly available test data sets and on the high-resolution test data set we constructed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call