NLFFTNet: A non-local feature fusion transformer network for multi-scale object detection

Kai Zeng,Qian Ma,Jiawen Wu,Sijia Xiang,Tao Shen,Lei Zhang

doi:10.1016/j.neucom.2022.04.062

Abstract

Recent object detection studies attempt to implement multi-scale feature fusion through complicated hierarchical structures. However, the existing feature fusion methods only focus on the interaction between the same local positions and fail to describe the long-distance dependencies of features. In this study, a novel non-local feature fused transformer convolutional network is proposed for object detection. This model can focus on global semantic information by calculating the attention of different positions to capture the long-distance dependency. Meanwhile, a dynamic data augment method called configurable mix-splicing is introduced to solve the problem of data imbalance between different classes. The experimental results indicate that attributed to the feature fusion and data augment method, our model achieves better performance than state-of-the-art models on two authoritative public datasets.

Full Text