Abstract

Most object detection methods use rectangular bounding boxes to represent the object, while the representative points network (RepPoints) employs a point set to describe the object. The RepPoints can provide more fine-grained localization and facilitates classification. However, it ignores the difference between localization and classification tasks. Therefore, a lightweight RepPoints with decoupling of the sampling point set (LRP-DS) is proposed in this paper. Firstly, the lightweight MobileNet-V2 and Feature Pyramid Networks (FPN) is employed as the backbone network to realize the lightweight network, rather than the Resnet. Secondly, considering the difference between classification and localization tasks, the sampling points of classification and localization are decoupled, by introducing classification free sampling method. Finally, due to the introduction of the classification free sampling method, the problem of the mismatch between the localization accuracy and the classification confidence is highlighted, so the localization score is employed to describe the localization accuracy independently. The final network structure of this paper achieves 73.3% mean average precision (mAP) on the VOC07 test dataset, which is 1.9% higher than original RepPoints with the same backbone network MobileNetV2 and FPN. Our LRP-DS has a detection speed of 20FPS for the input image of (1000, 600), on RTX2060 GPU, which is nearly twice as fast as the backbone network of ResNet50 and FPN. Experimental results show the effectiveness of our method.

Highlights

  • Object detection is one of the most widely used tasks in computer vision

  • Anchor-based object detection applies a large number of prior anchors to fit the boundary box of a real object, such as the famous single-stage object detection methods of YOLO [1], SSD [2], and the two-stage object detection algorithm represented by Faster R-CNN [3]

  • + Feature Pyramid Networks (FPN), this paper reduces the number of stacked convolutions from four to two, after sampling the classification branches and localization branches of RepPoints and ours LRP-DS

Read more

Summary

Introduction

Object detection is one of the most widely used tasks in computer vision. Current object detection methods can be roughly divided into two categories according to whether a priori anchor is needed: anchor-based object detection, and anchor-free object detection. In the process of reference, the predicted object box can be obtained according to the prior anchor and the offset calculated by the network. This paper proposes to decouple the feature sampling positions of object classification and localization, so as to give the sampling positions of feature in the classification branch a certain degree of freedom This allows the feature sampling points of classification to actively find the semantic key position of the object, to improve the recognition accuracy of the classification task. The sample sets of classification and localization are decoupled, due to the difference between classification and localization tasks; The localization score is employed to describe localization accuracy independently, to solve the more serious mismatch of localization accuracy and category probability, after the introduction of the classification free sampling method

Anchor-Based Object Detection
Anchor-Free Object Detection
The Mismatch between Classification and Localization Tasks
Rethinking RepPoints
Illustration of the sampling locations
Build the Backbone Network Based on MobileNetV2 and FPN
Overview of the proposed
Experimental Details
Ablation Study
Comparison with Other Methods
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.