Abstract
Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have