Abstract

Scene text detection is an important step in the scene text reading system, which has witnessed rapid development with convolutional neural networks. Feature Pyramid Network (FPN)[1] is a key component in modern scene text detection frameworks. Detectors use pyramid feature representation to solve the challenge of scale variation in scene text detection. However, the inconsistency across different feature scales is the primary limitation for the detectors based on feature pyramid. Thus, we propose a novel strategy to solve this issue, called attention feature pyramid network (AFPN). It solves this problem by introducing feature enhancement module (FEM) and dual attention module (DAM). AFPN first performs feature enhancement on the backbone output, and then adaptively adjusts the features fusion by introducing an attention mechanism. In feature enhancement module, it learns the way to spatially filter conflictive information to suppress the inconsistency. In dual attention module, we perform two types of attention mechanism on the output of FEM and model the semantic interdependencies in spatial and channel dimensions respectively. AFPN optimizes the network through a self-attention mechanism, which is expected to greatly improve the detection results. By replacing FPN with AFPN in DBNet[2](DBNet is a real-time scene text detector with differentiable binarization), our models achieve 0.2 percent points higher F-Measure (F1) when using ResNet-50 as backbone and 1.4 percent points higher F-Measure when using ResNet-18 as backbone. Through experiments we found that AFPN can greatly improve the performance of lightweight backbone networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call