End-to-end video text detection with online tracking

Hongyuan Yu,Yan Huang,Lihong Pi,Chengquan Zhang,Xuan Li,Liang Wang

doi:10.1016/j.patcog.2020.107791

Abstract

Text in videos usually acts as important semantic cues, which is helpful to video analysis. Video text detection is considered as one of the most difficult tasks in document analysis due to the following two challenges: 1) the difficulties caused by video scenes, i.e., motion blur, illumination changes, and occlusion; 2) the properties of text including variants of fonts, languages, orientations, and shapes. Most existing methods try to improve the video text detection through video text tracking, but treat these two tasks separately. This can significantly increase the amount of calculations and cannot take full advantage of the supervisory information of both tasks. In this work, we introduce explainable descriptor, combines appearance, geometry and PHOC features, to establish a bridge between detection and tracking and build an end-to-end video text detection model with online tracking to address these challenges together. By integrating these two branches into one trainable framework, they can promote each other and the computational cost is significantly reduced. Besides, the introduce explainable descriptor also make our end-to-end model have inherent interpretability. Experiments on existing video text benchmarks including ICDAR 2013 Video, DOST, Minetto and YVT verify the role of explainable descriptors in improving model expression ability and the proposed method significantly outperforms state-of-the-art methods. Our method improves F-score by more than 2% on all datasets and achieves 81.52% on the MOTA of the Minetto dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

End-to-end video text detection with online tracking

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Journal: Pattern Recognition	Publication Date: Jan 7, 2021
Citations: 26

Similar Papers

An End-to-End Video Text Detector with Online Tracking
Hongyuan Yu ... Errui Ding
-
Hongyuan Yu, et. al.Hongyuan Yu ... Errui Ding
01 Sep 2019
01 Sep 2019

DSText V2: A comprehensive video text spotting dataset for dense and small text
Weijia Wu ... Xiang Bai
Pattern Recognition | VOL. 149
Weijia Wu, et. al.Weijia Wu ... Xiang Bai
14 Dec 2023
Pattern Recognition | VOL. 149

ICDAR 2021 Competition on Scene Video Text Spotting
Zhanzhan Cheng ... Baorui Zou
-
Zhanzhan Cheng, et. al.Zhanzhan Cheng ... Baorui Zou
01 Jan 2020
ICDAR 2021 Competition on Scene Video Text Spotting
Zhanzhan Cheng ... Baorui Zou

Automatic Inpainting Scheme for Video Text Detection and Removal
A Mosleh ... N Bouguila
IEEE Transactions on Image Processing | VOL. 22
A Mosleh, et. al.A Mosleh ... N Bouguila
01 Nov 2013
IEEE Transactions on Image Processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

End-to-end video text detection with online tracking

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition