Abstract

Automated safety management in construction can reduce injuries by identifying hazardous postures, actions, and missing personal protective equipment (PPE). However, existing computer vision (CV) methods have limitations in connecting recognition results to text-based safety rules. To address this issue, this paper presents a multi-modal framework that bridges the gap between construction image monitoring and safety knowledge. The framework includes an image processing module that utilizes CV and dense image captioning techniques, and a text processing module that employs natural language processing for semantic similarity evaluation. Experiments showed a mean average precision of 49.6% in dense captioning and an F1 score of 74.3% in hazard identification. While the proposed framework demonstrates a promising multi-modal approach towards automated safety hazard identification and reasoning, improvements in dataset size and model performance are still needed to enhance its effectiveness in real-world applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.