Chinese text-line detection from web videos with fully convolutional networks

Chun Yang,Xu-Cheng Yin,Long-Huang Wu,Wei-Yi Pei

doi:10.1186/s41044-017-0028-2

Abstract

BackgroundIn recent years, video becomes the dominant resource of information on the Web, where the text within video usually carries significant semanticinformation. Video text extraction and recognition plays an essential role in web multimedia understanding and retrieval for big visual data analytics and applications. To deal with challenging backgrounds and embedding noises, most conventional approaches usually tend to design sophisticated pre-processing and post-progressing steps before and after text detection. In this paper, we present a simple yet powerful pipeline that directly and uniformly detects Chinese text lines for embedded captions from web videos.ResultsIn this Chinese text-line detection system, a fully convolutional network with local context is adopted to localize via an end-to-end learning way. The produced caption predictions are with the word level that could be directly fed into the character classifier. Text-line construction is then performed by heuristic strategies. A variety of experiments are conducted on several real-world web video datasets and demonstrated the effectiveness and efficiency of our proposed method.ConclusionThe proposed system can directly detect the English word and Chinese characters in the caption text-lines without word or character segmentation with the high performance on real-world web video datasets.

Highlights

In recent years, video becomes the dominant resource of information on the Web, where the text within video usually carries significant semantic information
We focus on text detection from web videos
The produced caption predictions are with the word level that could be directly fed into the character classifier

Summary

Introduction

Video becomes the dominant resource of information on the Web, where the text within video usually carries significant semantic information. Video text extraction and recognition plays an essential role in web multimedia understanding and retrieval for big visual data analytics and applications. To deal with challenging backgrounds and embedding noises, most conventional approaches usually tend to design sophisticated pre-processing and post-progressing steps before and after text detection. We present a simple yet powerful pipeline that directly and uniformly detects Chinese text lines for embedded captions from web videos. A variety of research efforts have been made toward extracting video captions in various big visual data analytics and applications. The major related techniques for video caption extraction involve three aspects: caption detection, caption segmentation and optical character recognition (OCR). Techniques from standard OCR, which focus on high resolution scans of printed (text) documents, are applicable for video images.

Methods

Results

Conclusion