Vision-based method for semantic information extraction in construction by integrating deep learning object detection and image captioning

Yiheng Wang,Bo Xiao,Ahmed Bouferguene,Mohamed Al-Hussein,Heng Li

doi:10.1016/j.aei.2022.101699

Abstract

Recently, vision-based monitoring has been widely adopted in construction management to improve crew productivity, reduce safety risks, and facilitate site planning. However, automated retrieval of semantic information (e.g., objects, activities, and interactions between objects) from construction images remains challenging due to the complex nature of construction sites. This paper proposes a novel semantic information extraction method by integrating deep learning object detection and image captioning, which aims to explore salient information from construction images or videos. In the proposed method, object detection has been employed as an encoder to extract the feature maps of construction object zones and the holistic image. The image captioning has been selected as the decoder to extract the semantic information. A post-processing method has been proposed to parse the semantic information into a graph format for better accessibility and visualization. In experiments, the proposed method has achieved the Consensus-based Image Description Evaluation (CIDEr) of 1.84. By adopting the proposed method, semantic information behind construction images can be presented to construction managers to assist their decision-making.

Full Text