Abstract

Construction safety management has been extensively investigated. Construction cameras have been widely adopted to monitor people’s performance in construction on-site. However, manually analyzing large quantities of video or image data is time-consuming and labor-intensive. Existing studies mostly focus on single element identification in videos or images, while the deeper semantic understanding of construction scenes with the whole scene is limited. Drawing on the attention mechanism, a framework is proposed to address this problem and identify semantic information such as multiple objects, relationships, and attributes from construction videos. This framework comprises the following two-step modeling approach: (1) a frame extraction model with an interframe difference mechanism is proposed to extract frames/images from construction videos and (2) an image scene understanding model that integrates a ResNet101 “encoder” and an LSTM + Attention “decoder” is put forward to identify semantic information/natural language descriptions from frames/images. Finally, the proposed framework is validated by multiple experiments with offline image datasets of construction scenes. The contributions of this research are twofold: (1) The proposed visual attention framework represents a significant and data-driven advancement in the cross-modal processing of construction video-image-natural language descriptions; (2) The automatic generation of video semantic information facilitates construction safety management such as workers’ safety state estimation and monitoring video/image retrieval and storage.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.