Detection and location of unsafe behaviour in digital images: A visual grounding approach

Jiajing Liu,Weili Fang,Peter E.D Love,Timo Hartmann,Hanbin Luo,Lulu Wang

doi:10.1016/j.aei.2022.101688

Abstract

Using computer vision and deep learning (e.g., Convolutional Neural Networks) to automatically recognise unsafe behaviour from digital images can help managers identify and respond quickly to such actions and mitigate an adverse event. However, there has been a tendency for computer vision studies in construction to focus solely on detecting unsafe behaviour (i.e., object detection) or the regions of interest with pre-defined labels. Moreover, such approaches have been unable to consider rich semantic information among multiple unsafe actions in a digital image. The research we present in this paper uses a safety rule query to determine and locate several unsafe behaviours in a digital image by employing a visual grounding approach. Our approach consists of: (1) visual and text feature extraction, (2) recursive sub-query, and (3) generation of the bounding box. We validate our approach by conducting an experiment to demonstrate it is effectiveness. The results from an experimental study demonstrate an average precision, recall, and F1-score were 0.55, 0.85, and 0.65, respectively, suggesting our approach can accurately identify and locate different types of unsafe behaviours from digital images acquired from a construction site.

Full Text