Abstract

Semantic segmentation could obtain the pixel level classification of surrounding environments which is an essential task for autonomous vehicles and mobile robots visual perception. Most existing semantic segmentation networks were focused on the visual perception of autonomous vehicles. Little attention is paid to the semantic segmentation for UAV (Unmanned Aerial Vehicle) visual perception, which is crucial to UAV autonomous flight and landing spot searching. Compared with views from autonomous vehicles, the UAV-based views were more challenging for the semantic segmentation task due to images captured by UAV containing large-scale variation of objects size caused by different altitude and angle. The existing semantic segmentation networks for the visual perception of autonomous vehicles are generally inadequate to effectively extract the representative features of UAV images which required contain context information and local information simultaneously. A cascade composite transformer-based semantic segmentation network is proposed in this study for UAV visual perception. A cascade composite encoder is designed which consists of three transformer-based feature extraction backbones and cascade fused low-level features, middle-level features and high-level features to achieve better feature representation capacity. The spatial enhanced transformer block is implemented as the basic feature extraction block of each backbone to make the extracted features contain context information of environments and local information of objects. A symmetric rhombus decoder is proposed to integrate multi-stage features and make fully utilise of middle stage features which contained abundance of useful information, thus accurately pixel level prediction could be obtained in this way. Ablation studies and comparison experiments for the proposed CCTseg have been conducted on two public UAV imagery datasets suitable for UAV autonomous flight and landing spot observing. Experimental results have demonstrated the effectiveness of the proposed network structure and the superiority of proposed network over other state-of-the-art methods for the semantic segmentation of UAV visual perception.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call