Learning discriminative feature representation with pixel-level supervision for forest smoke recognition

Huanjie Tao,Qianyue Duan,Minghao Lu,Zhenwu Hu

doi:10.1016/j.patcog.2023.109761

Abstract

Existing vision-based smoke recognition methods still face the issues of low detection rates and high false alarm rates in complex scenes. One reason is that they label light smoke and heavy smoke as the same value, which ignores the differences in multiple attribute information involved in the smoke imaging process. To solve this issue, this paper presents a pixel-level supervision neural network (PSNet) to learn discriminative feature representations for forest smoke recognition. First, the pixel-level supervision information, including the background component, smoke component, fusion ratio, and class information, is cooperatively considered to effectively guide the model training process. To avoid negative transfer caused by the asynchronous optimization of shared layer parameters and achieve synchronous minimization of each loss term, a regularization term based on the smoke imaging principle and a weight dynamic updating method are proposed to balance the weight coefficients of different loss terms. Second, a detail-difference-aware module (DDAM) based on a detail-difference-aware block (DDAB) and a spatial attention block (SAB) is proposed to distinguish smoke and smoke-like targets by fusing xy-shared convolution and z-shared convolution, which adaptively allocates the weights over different positions to prioritize the most informative visual elements in the spatial domain. Third, an attention-based feature separation module (AFSM) is proposed to relieve mutual interference in extracting background features and smoke features by designing component interaction attention (CIA), background component attention (BCA), smoke component attention (SCA), and enhanced residual blocks (ERBs), which can guide the interaction and separation process of background information and smoke information to enhance the discriminative spatial features and suppress interference features. ERB effectively eliminates noise and enhances smoke edge information based on median filters. Finally, to further enhance the feature representation capability, a multiconnection aggregation method (MCAM) is proposed by fully aggregating local and global features simultaneously. Extensive experiments show that our method achieves better performance than existing smoke recognition methods.Extensive experiments show that our PSNet achieves better performance than existing smoke recognition methods. For smoke recognition, our PSNet achieves a 96.95% detection rate, 3.02% false alarm rate, and 0.9694 F1-score. The average calculation time for each image is only 0.0195. For smoke component separation, our PSNet also achieves 0.0014 on evaluation criteria mean square error between predicted smoke component images and labelled smoke component images. These key experimental results are better than those of previous methods.

Full Text