Enhanced EAST: Improving Network's Feature Extraction Ability and Text Complete Shape Perception

Liu Yang,Yonghong Song,Yuanlin Zhang

doi:10.1109/icdar.2019.00128

Abstract

EAST [1] is a popular end to end text detector, however, it performs deficiently with long or large texts, because of limited network's feature extraction capacity and narrow receptive fields. Based on EAST, we propose an approach named Enhanced EAST. Firstly, we offer low-level feature layers more semantic information by introducing information from high levels to low levels, which reduces the information gap between different layers. Meanwhile, we utilize a two-stream large kernel convolution to increase receptive fields with reasonable computational cost, therefore, improving the network's features detection and fusion ability. In addition, we also optimize the label generation of training data and design a weighted mask for each text, which can guide the training process to enhance the network's complete shape perception of texts, thus impelling the predicted text boxes locate more accurately. In the end, we perform data equalization and augmentation in the experiments and experiment results on ICDAR 2015, MSRA-TD500 and ICDAR 2017 MLT datasets demonstrate the proposed algorithm achieves a state-of-art performance in multi-oriented scene text detection.

Full Text