Scene Text Detection Based on Multi-Dimensional Feature Fusion with Instance-Wise Loss

Qin Wu,Peiwen Zhu,Guodong Guo

doi:10.1142/s0218001423530014

Abstract

Over the past few years, scene text detection has witnessed rapid progress due to the development of deep neural networks. However, segmentation-based methods may fail to detect pixels near boundary well, and scale variations of text instances may lead to small text missing. To tackle these problems, we propose a novel segmentation-based detector for scene text detection, which can improve the quality of the detected texts. Specifically, a Multi-dimensional Feature Fusion module is used to extract structural and spatial text features from the perspective of height, width and channel, which helps to improve the representation ability of the network. In order to obtain more accurate boundaries of the detected text instances, a Boundary Refinement Branch is introduced to strengthen the supervision for pixels adjacent to boundary. Meanwhile, we propose an Instance-wise Loss to deal with text instances of different scales. Extensive ablation studies validate the effectiveness of these proposed modules. Experiments on several benchmark datasets show that our method achieves better results compared with the state-of-the-art methods.

Full Text