Abstract

Martian rock segmentation aims to separate rock pixels from background, which plays a crucial role in downstream tasks, such as traversing and geologic analysis by Mars rovers. The U-Nets have achieved certain results in rock segmentation. However, due to the inherent locality of convolution operations, U-Nets are inadequate in modeling global context and long-range spatial dependencies. Although emerging Transformers can solve this, they suffer from difficulties in extracting and retaining sufficient low-level local information. These shortcomings limit the performance of the existing networks for Martian rocks that are variable in shape, size, texture, and color. Therefore, we propose RockFormer, the first U-shaped Transformer framework for Mars rock segmentation, consisting of a hierarchical encoder–decoder architecture with a feature refining module (FRM) connected between them. Specifically, the encoder hierarchically generates multiscale features using an improved vision Transformer (improved-ViT), where both abundant local information and long-range contexts are exploited. The FRM removes less representative features and captures global dependencies between multiscale features, improving RockFormer’s robustness to Martian rocks with diverse appearances. The decoder is responsible for aggregating these features for pixelwise rock prediction. For evaluation, we establish two Mars rock datasets, including both real and synthesized images. One is MarsData-V2, an extension of our previously published MarsData collected from real Mars rocks. The other is SynMars, a synthetic dataset sequentially photographed from a virtual terrain built referring to the TianWen-1 dataset. Extensive experiments on the two datasets show the superiority of RockFormer for Martian rock segmentation, achieving state-of-the-art performance with decent computational simplicity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call