Abstract

Wildfires are a serious threat to ecosystems and human life. Usually, smoke is generated before the flame, and due to the diffusing nature of the smoke, we can detect smoke from a distance, so wildfire smoke detection is especially important for early warning systems. In this paper, we propose a 3D convolution-based encoder–decoder network architecture for video semantic segmentation in wildfire smoke scenes. In the encoder stage, we use 3D residual blocks to extract the spatiotemporal features of smoke. The downsampling feature from the encoder is upsampled by the decoder three times in succession. Then, three smoke map prediction modules are, respectively, passed, the output smoke prediction map is supervised by the binary image label, and finally, the final prediction is obtained by feature map fusion. Our model can achieve end-to-end training without pretraining from scratch. In addition, a dataset including 90 smoke videos is tested and trained in this paper. The experimental results of the smoke video show that our model quickly and accurately segmented the smoke area and produced few false positives.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call