Abstract

In order to make the classification and regression of single-stage detectors more accurate, an object detection algorithm named Global Context You-Only-Look-Once v3 (GC-YOLOv3) is proposed based on the You-Only-Look-Once (YOLO) in this paper. Firstly, a better cascading model with learnable semantic fusion between a feature extraction network and a feature pyramid network is designed to improve detection accuracy using a global context block. Secondly, the information to be retained is screened by combining three different scaling feature maps together. Finally, a global self-attention mechanism is used to highlight the useful information of feature maps while suppressing irrelevant information. Experiments show that our GC-YOLOv3 reaches a maximum of 55.5 object detection mean Average Precision (mAP)@0.5 on Common Objects in Context (COCO) 2017 test-dev and that the mAP is 5.1% higher than that of the YOLOv3 algorithm on Pascal Visual Object Classes (PASCAL VOC) 2007 test set. Therefore, experiments indicate that the proposed GC-YOLOv3 model exhibits optimal performance on the PASCAL VOC and COCO datasets.

Highlights

  • In recent years, deep learning [1] has been popularly accepted all over the world, and exhibits better robustness and higher accuracy than traditional methods

  • Inspired by the above-mentioned works on YOLO, we propose a GC-YOLOv3 network to keep the depth of network

  • Inspired by the above-mentioned works on YOLO, we propose a GC-YOLOv3 network to keep the depth of layers while thewhile detector could have a better in objectindetection

Read more

Summary

Introduction

Deep learning [1] has been popularly accepted all over the world, and exhibits better robustness and higher accuracy than traditional methods. The traditional object detection method can be divided into three steps: (1) region selection—the region of interest can be selected using the sliding window technique with different scales. Despite the redundancy, this method could be used to mark all possible locations with arbitrary scanning. Manual feature extraction methods, such as Scale Invariant Feature Transform (SIFT) [3], Histogram of Oriented Gradient (HOG) [4] and Haar-like [5], are commonly used for extracting features from anchor boxes These traditional methods cannot extract all of the features because of poor robustness and inadaptability to changing shape and light conditions. One channel of outputs is used to determine the category of objects and four channels are used to predict the coordinates of the diagonal points of object boxes

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.