Enhancing mask detection performance based on YOLOv5 model optimization and attention mechanisms

Guangyuan Yang

doi:10.54254/2755-2721/50/20241160

Abstract

Due to the COVID-19 pandemic, there has been a significant increase in the usage of masks, leading to more complex scenarios for mask detection techniques. This paper focuses on optimizing the performance of mask detection using the You Only Look Once (YOLO) v5 model. In this study, the yolov5 target detection model was employed for training the mask dataset. Diverse model improvement techniques were explored to enhance the model's capability to capture crucial features and differentiate masks from the background in complex scenarios. Finally, the modified model was compared with the earlier original target detection model to identify the most considerable performance gain. The CSPDarknet design with the TensorFlow framework is utilized in this study, and the Attention Mechanism module is implemented through the Keras library. The objective is to optimize the three feature layers between the backbone network and the neck by integrating multiple attention mechanisms. This will enable the model to more quickly and accurately capture important features when dealing with complex scenarios by adjusting the feature map weights. Additionally, in the feature pyramid network, shallow feature maps are fused with deeper feature maps in a certain order to determine the most efficient feature fusion method. Finally, this study identified the optimal combination of attention mechanism and feature fusion through ablation experiments. The results of the experiment demonstrate that the combination of SE block and shallow feature fusion (SE + FF2 model) can greatly enhance category confidence, leading to an improved model performance.

Full Text