Abstract

Feature pyramids of convolutional neural networks (ConvNets)—from bottom to top—are used by most recent researchers for the improvement of object detection accuracy, but they seldom aim to address the correlation of each feature channel and the fusion of low-level features and high-level features. In this paper, an Attention Pyramid Network (APN) is proposed, which mainly contains the adaptive transformation module and feature attention block. The adaptive transformation module utilizes the multiscale feature fusion, and makes full use of the accurate target location information of low-level features and the semantic information of high-level features. Then, the feature attention block strengthens the features of important channels and weakens the features of unimportant channels through learning. By implementing the APN in a basic Mask R-CNN system, our method achieves state-of-the-art results on the MS COCO dataset and 2018 WAD database without bells and whistles. In addition, the structure of the APN makes the network parameters lighter, and runs at 4 ms on average, which is ignorable when compared to the inference time of the backbone of ConvNet.

Highlights

  • Along with the popularization of the artificial intelligence systems [1,2], IOT [3] and the accumulation of image data [4], automatic object detection is increasingly being widely used in video surveillance and robot vision

  • The performance of the proposed Attention Pyramid Network (APN) was evaluated via extensive simulations using the MS COCO and the WAD datasets; the results show the effectiveness of our approach

  • We comprehensively evaluated the APN using the MS COCO dataset and the 2018 WAD dataset, and our results outperform the baseline, i.e., the original feature pyramid networks (FPN)

Read more

Summary

Introduction

Along with the popularization of the artificial intelligence systems [1,2], IOT [3] and the accumulation of image data [4], automatic object detection is increasingly being widely used in video surveillance and robot vision. Object detection is a fundamental computer-vision task [5], and the existence of multiple scales and ratios is the most challenging problem in object detection. More and more attention is being paid to this problem, and various detection methods have emerged. In the image pyramid methods [6,7], as shown, images are generally resized to multiple scales and resized to the same ratio for training and inference. Because of the large number of images, the methods are computationally expensive

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.