Types of snow degradation are complex and diverse. Snow removal often requires the construction of sufficient visual representations. Although convolution-based methods perform well in local perception, they struggle to model globally. On the other hand, methods based on self-attention can capture long-range dependencies but often overlook local information and texture details. In this paper, we proposed a hybrid network called WaveFrSnow, aimed at enhancing the performance of single-image snow removal by combining the advantages of convolution and cross-attention. Firstly, we introduced a frequency-separation cross-attention mechanism based on wavelet transform (WaveFrSA) to enhance the global and texture representations of snow removal. Specifically, frequency-separated attention perceives the texture in the high-frequency branch, captures global information in the low-frequency branch, and introduces convolution to obtain local features. In addition, we constructed local representations through efficient convolutional encoder branches. Furthermore, we develop a Multi-Scale Degradation Aggregation (MSDA) module to integrate rich implicit degradation features. Based on the MSDA module, a Degradation Area Restoration (DAR) network is constructed, aiming to achieve high-quality image restoration following the snow removal process. Taken together, comprehensive experimental results on multiple publicly available datasets demonstrate the superiority of the proposed method over the state-of-the-art method. Additionally, the desnowing results effectively improve the accuracy of downstream vision tasks. The code and datasets in this study are available at https://github.com/dxw2000/WaveFrSnow.
Read full abstract