Abstract

This study proposes a multiheaded object detection algorithm referred to as MANet. The main purpose of the study is to integrate feature layers of different scales based on the attention mechanism and to enhance contextual connections. To achieve this, we first replaced the feed-forward base network of the single-shot detector with the ResNet–101 (inspired by the Deconvolutional Single-Shot Detector) and then applied linear interpolation and the attention mechanism. The information of the feature layers at different scales was fused to improve the accuracy of target detection. The primary contributions of this study are the propositions of (a) a fusion attention mechanism, and (b) a multiheaded attention fusion method. Our final MANet detector model effectively unifies the feature information among the feature layers at different scales, thus enabling it to detect objects with different sizes and with higher precision. We used the 512 × 512 input MANet (the backbone is ResNet–101) to obtain a mean accuracy of 82.7% based on the PASCAL visual object class 2007 test. These results demonstrated that our proposed method yielded better accuracy than those provided by the conventional Single-shot detector (SSD) and other advanced detectors.

Highlights

  • Target detection is a fundamental, challenging, and long-standing problem, and has been a hotspot in the field of computer vision research for decades [1,2,3]

  • Developed target detectors that have been based on convolutional neural networks (CNNs) have been classified in two types: The first is the two-stage detector type, such as Region-Based CNNs (R–CNNs) [4], Region-Based Full

  • We propose in this study a new, single-stage detection architecture, commonly referred to as MANet, which aggregates feature information at different scales

Read more

Summary

Introduction

Target detection is a fundamental, challenging, and long-standing problem, and has been a hotspot in the field of computer vision research for decades [1,2,3]. The purpose of target detection is to determine if any instances of a specified category exist in a given image. As one of the cornerstones of image understanding and computer vision, target detection forms the basis for more complex or higher-level visual tasks, such as object tracking, image capture, instance segmentation, and others. The method of automatic learning of represented features from data based on deep learning has effectively improved the performance of target detection. The design of better neural networks has become a key issue toward the improvement of target detection performance. Developed target detectors that have been based on convolutional neural networks (CNNs) have been classified in two types: The first is the two-stage detector type, such as Region-Based CNNs (R–CNNs) [4], Region-Based Full

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.