Abstract
Construction sites are among the most hazardous places with various safety issues. The high rate of hazards on construction sites can be attributed to the dynamic and complex characteristics of construction-related entities, such as the movement of construction equipment and workers as well as the interactions among them. Tracking construction equipment and workers can help avoid potential collisions and other accidents to achieve safer on-site conditions. As construction equipment (e.g. excavators, trucks, cranes, and bulldozers) plays a significant role in construction projects, it is important to track the location, pose and movement of construction equipment. Currently, with the wide installation of surveillance cameras on construction sites, computer vision techniques are explored to process the captured surveillance videos and images, such as to monitor the site conditions and prevent potential hazards. Previous studies have attempted to identify and locate different types of construction equipment on construction sites based on surveillance videos using computer vision techniques. However, there are limited studies that automatically estimate the full body pose and movement of on-site construction equipment, which can greatly influence the safety condition of construction sites and the utilization of the equipment itself.In this study, a methodology framework is developed for automatically estimating the poses of different construction equipment in videos captured on construction sites using computer vision and deep learning techniques. Firstly, keypoints of equipment are defined, based on which the images collected from the surveillance cameras are annotated to generate the ground truth labels. 70%, 10%, and 20% of the annotated image dataset are used for training, validation and testing, respectively. Then, the architectures of three types of deep learning networks i.e. Stacked Hourglass Network (HG), Cascaded Pyramid Network (CPN), and an ensemble model (HG-CPN) integrating Stacked Hourglass and Cascaded Pyramid Network are constructed and trained in the same training environment. After training, the three models are evaluated on the testing dataset in terms of normalized errors (NE), percentage of correct keypoints (PCK), area under the curve (AUC), detection speed, and training time. The experiment results demonstrate the promising performance of our proposed methodology framework for automatically estimating different full body poses of construction equipment with high accuracy and fast speed. It is indicated by experiments that both HG and CPN can achieve relative high accuracy, with a PCK value of 91.19% and 91.78% respectively for estimating the equipment full body poses. In addition, the ensemble model with online data augmentation can further improve the accuracy, achieving a NE of 14.57 × 10−3, a PCK of 93.43%, and an AUC of 39.72 × 10−3 at the detection speed of 125 millisecond (ms) per image. This study lays the foundation for applying computer vision and deep learning techniques in the full body pose estimation of construction equipment, which can contribute to the real-time safety monitoring on construction sites.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have