Abstract

Keypoint-based object detection is one of the most efficient and speedy methods at present, yet its performance is often worse than the anchor-based method. Without prior settings in the keypoint-based method, the huge search space of the keypoints results in the high recall but low precision. In this paper, the wide dual-path backbone network is introduced as a feature extractor to extract richer original information, which has fewer parameters and better classification performance. Then, the attention fusion module is designed to effectively fuse the dual-path with the consideration of the respective advantages of the residual-path and the densely connected path. In order to provide more accurate pixel-level information for keypoint prediction, the upsample dual-attention module is proposed to recover the spatial size of the feature map, which integrates multi-scale of channel-wise and spatial attention. Compared with other state-of-the-art detectors, this method has achieved accuracy-efficiency results with fewer parameters, lower FLOPs, and smaller model size. Experimental results show that the proposed wide dual-path backbone network has achieved 4.98% top1-error on the CIFAR-10 classification dataset. On the PASCAL VOC object detection dataset, this model has achieved an accuracy-efficiency tradeoff result of 78.3% mAP at the speed of 41 FPS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call