This paper aims to solve the problem of the difficulty in balancing the model size and detection accuracy of the unmanned mining truck detection network in open-pit mines, as well as the problem that the existing model is not suitable for mining truck equipment. To address this problem, we proposed a lightweight vehicle detection algorithm model based on the improvement of YOLOv8. Through a series of innovative structural adjustments and optimization strategies, the model has achieved high accuracy and low complexity. This paper replaces the backbone network of YOLOv8s with the FasterNet_t0 (FN) network. This network has the advantages of simple structure and high lightweight, which effectively reduces the amount of calculation and parameters of the model. Then the feature extraction structure of the YOLOv8 neck is replaced with the BiFPN (Bi-directional Feature Pyramid Network) structure. By increasing cross-layer connections and removing nodes with low contribution to feature fusion, the fusion and utilization of features of different scales are optimized, the model performance is further improved, and the number of parameters and calculations are reduced. To make up for the possible loss of accuracy caused by lightweight improvements, this paper replaces the detection head with Dynamic Head. This design can introduce the self-attention mechanism from the three dimensions of scale, space, and task, significantly improving the detection accuracy of the model while avoiding the additional computational burden. In terms of loss function, this paper introduces a combination of SIoU loss and NWD (normalized Gaussian Wasserstein distance) loss. These two adjustments enable the model to cope with different scenarios more accurately, especially the detection effect of small target mining trucks is significantly improved. In addition, this paper also adopts the amplitude-based layer adaptive sparse pruning algorithm (LAMP) to further compress the model size while maintaining efficient detection performance. Through this pruning strategy, the model further reduces its dependence on computing resources while maintaining key performance. In the experimental part, a dataset of 3000 images was first constructed, and these images were preprocessed, including image enhancement, denoising, cropping, and scaling. The experimental environment was set up on the Autodl cloud server, using the PyTorch 2.5.1 framework and Python 3.10 environment. Through four sets of ablation experiments, we verified the specific impact of each improvement on the model performance. The experimental results show that the lightweight improvement strategy significantly improves the detection accuracy of the model, while greatly reducing the number of parameters and calculations of the model. Finally, we conducted a comprehensive comparative analysis of the improved YOLOv8s model with other popular algorithms and models. The results show that our model leads in detection accuracy with 76.9%, which is more than 10% higher than the performance of similar models. At the same time, compared with other models that achieve similar accuracy levels, our model is only about 20% of the size. These results fully prove that the improvement strategy we adopted is feasible and has obvious advantages in improving model efficiency.
Read full abstract