Abstract

Accurately identifying and localising small objects within images or videos is a critical challenge in the field of computer vision. It is mostly applied in scenarios that require high real-time performance, such as pedestrian detection and autonomous driving scenarios. These tiny targets generally include small objects at long distances, or objects appearing in low-resolution images, due to which it becomes exceptionally difficult to extract effective feature information. Since YOLOv8 with its large downsampling multiplier leads to deeper feature maps that make it difficult to detect tiny objects, we find that the use of residual structures in the convolution module can enhance the accuracy of small object detection. However, this undoubtedly increases the computational cost, so we lightened the convolution module to make it more suitable for practical applications and named it Halved Deep Pointwise Convolution (HDPConv). A cross-level partial module Variety of View Group Shuffle Cross Stage Partial Network (VOV-GSCSP) is also utilised, using a rational architecture as well as multi-scale information fusion, to ensure that the overall model is lightweight while obtaining rich gradient flows. On this basis, we propose a new network lightweight model HV-YOLOv8. In multiple sets of comparative experiments on two datasets (containing several state-of-the-art solutions as well as classical ones), we demonstrate the superiority of HV-YOLOv8, in particular, the accuracy is improved by 1.4% compared to YOLOv8, while the number of parameters and the amount of computation are drastically reduced.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call