Abstract

AbstractCompared with common urban landscape semantic segmentation, unmanned aerial vehicle (UAV) image semantic segmentation is more challenging because small targets have very low pixel percentages and multi‐scale features due to the influence of flight altitude. Yet, the commonly used successive grid downsampling strategy in the current transformer‐based methods omits some important features of small targets. Furthermore, due to the complex background interference, it can lead to even worse results. In reaction to this, existing strategies aim to maintain superior resolution. Nevertheless, the application of this method incurs considerable computational costs, which brings challenges for the practical applications of UAVs. So it is significant to design a novel framework to balance retaining more pixels representing small objects during downsampling and reducing computational costs. For this, the Channel Selection and the Local Attention Transformer Model (CSLFormer) are proposed. During the overlap patch embedding process of feature maps, the model allocates half of the important channels to global attention and local attention. These two types of attention focus on different aspects: one learns the relationships and importance among various patches, while the other emphasizes the features of individual patches. The method shows superior performance on two public datasets: AeroScapes and Vaihingen, achieving mean intersection over union (mIoU) of 75.57% and 78.93%, respectively. The proposed CSLFormer has been released on GitHub: https://github.com/leoda1/CSLFormer.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.