To enable person detection tasks in surveillance footage to be deployed on edge devices and their efficient performance in resource-constrained environments in real-time, a lightweight person detection model based on YOLOv8n was proposed. This model balances high accuracy with low computational cost and parameter size. First, the MSBlock module was introduced into YOLOv8n. Then, a series of modifications were made to the MSBlock structure. Next, a heterogeneous PAFPN with improved MSBlock was formed using heterogeneous convolution kernels. Finally, AKConv, a variable kernel convolution, was applied to further reduce the number of parameters and the computational cost while improving accuracy. A series of experiments demonstrated that, due to these improvements, the proposed lightweight model achieved a reduction of nearly 10% in parameter size and 5% in the floating-point computational cost compared to the original YOLOv8n. Additionally, on a custom surveillance dataset, the model shows a 1.4% improvement in mAP@0.5:0.95, and on a more complex subset of the PASVOC public dataset, the model achieved a 2.8% improvement in mAP@0.5 and a 1.2% improvement in mAP@0.5:0.95, proving the high accuracy and generalization ability of the improved lightweight model.
Read full abstract