A Fast and Robust Safety Helmet Network Based on a Mutilscale Swin Transformer

Changcheng Xiang,Duofen Yin,Zaixue Yu,Xu Jian,Fei Song,Huaming Gong

doi:10.3390/buildings14030688

Abstract

Visual inspection of the workplace and timely reminders of unsafe behaviors (e.g, not wearing a helmet) are particularly significant for avoiding injuries to workers on the construction site. Video surveillance systems generate large amounts of non-structure image data on site for this purpose; however, they require real-time recognition automation solutions based on computer vision. Although various deep-learning-based models have recently provided new ideas for identifying helmets in traffic monitoring, few solutions suitable for industry applications have been discussed due to the complex scenarios of construction sites. In this paper, a fast and robust network based on a mutilscale Swin Transformer is proposed for safety helmet detection (FRSHNet) at construction sites, which contains the following contributions. Firstly, MAE-NAS with the variant of MobileNetV3’s MobBlock as a basic block is applied to implement feature extraction. Simultaneously, a multiscale Swin Transformer module is utilized to obtain the spatial and contexture relationships in the multiscale features. Subsequently, in order to meet the scheme requirements of real-time helmet detection, efficient RepGFPN are adopted to integrate refined multiscale features to form a pyramid structure. Extensive experiments were conducted on the publicly available Pictor-v3 and SHWD datasets. The experimental results show that FRSHNet consistently provided a favorable performance, outperforming the existing state-of-the-art models.

Full Text