During urban fire incidents, real-time videos and images are vital for emergency responders and decision-makers, facilitating efficient decision-making and resource allocation in smart city fire monitoring systems. However, real-time videos and images require simple and embeddable models in small computer systems with highly accurate fire detection ratios. YOLOv5s has a relatively small model size and fast processing time with limited accuracy. The aim of this study is to propose a method that employs a YOLOv5s network with a squeeze-and-excitation module for image filtering and classification to meet the urgent need for rapid and accurate real-time screening of irrelevant data. In this study, over 3000 internet images were used for crawling and annotating to construct a dataset. Furthermore, the YOLOv5, YOLOv5x and YOLOv5s models were developed to train and test the dataset. Comparative analysis revealed that the proposed YOLOv5s model achieved 98.2% accuracy, 92.5% recall, and 95.4% average accuracy, with a remarkable processing speed of 0.009 s per image and 0.19 s for a 35 frames-per-second video. This surpasses the performance of other models, demonstrating the efficacy of the proposed YOLOv5s for real-time screening and classification in smart city fire monitoring systems.
Read full abstract