High-resolution remote sensing image scene classification has attracted widespread attention as a basic earth observation task. Remote sensing scene classification aims to assign specific semantic labels to remote sensing scene images to serve specified applications. Convolutional neural networks are widely used for remote sensing image classification due to their powerful feature extraction capabilities. However, the existing methods have not overcome the difficulties of large-scene remote sensing images of large intraclass diversity and high interclass similarity, resulting in low performance. Therefore, we propose a new remote sensing scene classification method that combines lightweight channel attention and multiscale feature fusion discrimination, called LmNet. First, ResNeXt is used as the backbone; second, a new lightweight channel attention mechanism is constructed to quickly and adaptively learn the salient features of important channels. Furthermore, we designed a multiscale feature fusion discrimination framework, which fully integrates shallow edge feature information and deep semantic information to enhance feature representation capabilities and uses multiscale features for joint discrimination. Finally, a cross-entropy loss function based on label smoothing is built to reduce the influence of interclass similarity on feature representation. In particular, our lightweight channel attention and multiscale feature fusion mechanism can be flexibly embedded in any advanced backbone as a functional module. The experimental results on three large-scale remote sensing scene classification datasets show that compared with the existing advanced methods, our proposed high-efficiency end-to-end scene classification method has reached state-of-the-art. Moreover, our method has a weaker dependence on labeled data and provided better generalization performance.
Read full abstract