Abstract
Most object detection methods use rectangular bounding boxes to represent the object, while the representative points network (RepPoints) employs a point set to describe the object. The RepPoints can provide more fine-grained localization and facilitates classification. However, it ignores the difference between localization and classification tasks. Therefore, a lightweight RepPoints with decoupling of the sampling point set (LRP-DS) is proposed in this paper. Firstly, the lightweight MobileNet-V2 and Feature Pyramid Networks (FPN) is employed as the backbone network to realize the lightweight network, rather than the Resnet. Secondly, considering the difference between classification and localization tasks, the sampling points of classification and localization are decoupled, by introducing classification free sampling method. Finally, due to the introduction of the classification free sampling method, the problem of the mismatch between the localization accuracy and the classification confidence is highlighted, so the localization score is employed to describe the localization accuracy independently. The final network structure of this paper achieves 73.3% mean average precision (mAP) on the VOC07 test dataset, which is 1.9% higher than original RepPoints with the same backbone network MobileNetV2 and FPN. Our LRP-DS has a detection speed of 20FPS for the input image of (1000, 600), on RTX2060 GPU, which is nearly twice as fast as the backbone network of ResNet50 and FPN. Experimental results show the effectiveness of our method.
Highlights
Object detection is one of the most widely used tasks in computer vision
Anchor-based object detection applies a large number of prior anchors to fit the boundary box of a real object, such as the famous single-stage object detection methods of YOLO [1], SSD [2], and the two-stage object detection algorithm represented by Faster R-CNN [3]
+ Feature Pyramid Networks (FPN), this paper reduces the number of stacked convolutions from four to two, after sampling the classification branches and localization branches of RepPoints and ours LRP-DS
Summary
Object detection is one of the most widely used tasks in computer vision. Current object detection methods can be roughly divided into two categories according to whether a priori anchor is needed: anchor-based object detection, and anchor-free object detection. In the process of reference, the predicted object box can be obtained according to the prior anchor and the offset calculated by the network. This paper proposes to decouple the feature sampling positions of object classification and localization, so as to give the sampling positions of feature in the classification branch a certain degree of freedom This allows the feature sampling points of classification to actively find the semantic key position of the object, to improve the recognition accuracy of the classification task. The sample sets of classification and localization are decoupled, due to the difference between classification and localization tasks; The localization score is employed to describe localization accuracy independently, to solve the more serious mismatch of localization accuracy and category probability, after the introduction of the classification free sampling method
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.