Extreme small-scale prediction head-based efficient YOLOV5 for small-scale object detection

Nekkanti Gowthami,Shalanti Vineetha Blessy

doi:10.1088/2631-8695/ad3cb7

Abstract

Recognition models achieved exceptional performance in object detection during the computer vision era, but they still rely on Small-Scale Object Detection (SSOD). SSOD remains difficult due to the variety of shapes and orientations of the object, as well as fixed prediction heads at the detection stage. To overcome these challenges, our paper proposed an auto anchor module, a Spatial Pyramid Pooling Faster layer (SPPF) in the feature refinement network, and an extremely small-scale prediction head at the detection stage of the model. With the auto anchor feature, the model is able to adjust the anchor boxes dynamically during training, which can improve its overall performance. The SPPF layer divides the feature map into five pyramid levels, each corresponding to a different scale. This enables the model to detect objects of different sizes by pooling features from all pyramid levels. Extreme Small Scale prediction head is specifically designed for small object detection and uses anchor boxes with small sizes and feature re-sampling techniques to improve the accuracy of SSOD. We perform quantitative evaluations on the VOC and achieved 85.4% mAP, 79.8% precision, 80.3% recall, and 80% F1-score. These reliable results prove that the suggested model performs better than existing models in detecting small-scale objects.

Full Text