Light4Mars: A lightweight transformer model for semantic segmentation on unstructured environment like Mars

Yonggang Xiong,Xueming Xiao,Meibao Yao,Hutao Cui,Yuegang Fu

doi:10.1016/j.isprsjprs.2024.06.008

Yonggang Xiong, Xueming Xiao + Show 3 more

https://doi.org/10.1016/j.isprsjprs.2024.06.008

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Auto-semantic segmentation is important for robots on unstructured and dynamic environments like planets where ambient conditions cannot be controlled and the scale is larger than that found indoors. Current learning-based methods have achieved breathtaking improvements on this topic. For onboard applications, however, all those methods still suffer from huge computational costs and are difficult to deploy on edge devices. In this paper, unlike previous transformer-based SOTA approaches that heavily relied on complex design, we proposed Light4Mars, a lightweight model with minimal computational complexity while maintaining high segmenting accuracy. We designed a lightweight squeeze window transformer module that focuses on window-scale feature extraction and is more effective in learning global and local contextual information. The aggregated local attention decoder is utilized to fuse semantic information at different scales, especially for unstructured scenes. Since there are few all-terrain datasets for semantic segmentation of unstructured scenes like Mars, we built a synthetic dataset SynMars-TW, referencing images collected by the ZhuRong rover on the Tianwen-1 mission and the Curiosity rover. Extensive experiments on SynMars-TW and the real Mars dataset, MarsScapes show that our approach achieves state-of-the-art performance with favorable computational simplicity. To the best of our knowledge, the proposed Light4Mars-T network is the first segmentation model for Mars image segmentation with parameters lower than 0.1M. Code and datasets are available at https://github.com/CVIR-Lab/Light4Mars.

Full Text