Abstract

Recent object detection studies attempt to implement multi-scale feature fusion through complicated hierarchical structures. However, the existing feature fusion methods only focus on the interaction between the same local positions and fail to describe the long-distance dependencies of features. In this study, a novel non-local feature fused transformer convolutional network is proposed for object detection. This model can focus on global semantic information by calculating the attention of different positions to capture the long-distance dependency. Meanwhile, a dynamic data augment method called configurable mix-splicing is introduced to solve the problem of data imbalance between different classes. The experimental results indicate that attributed to the feature fusion and data augment method, our model achieves better performance than state-of-the-art models on two authoritative public datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.