Abstract

Water segmentation is a critical task for ensuring the safety of unmanned surface vehicles (USVs). Most existing image-based water segmentation methods may be inaccurate due to light reflection on the water. The fusion-based method combines the paired 2D camera images and 3D LiDAR point clouds as inputs, resulting in a high computational load and considerable time consumption, with limits in terms of practical applications. Thus, in this study, we propose a multimodal fusion water segmentation method that uses a transformer and knowledge distillation to leverage 3D LiDAR point clouds in order to assist in the generation of 2D images. A local and non-local cross-modality fusion module based on a transformer is first used to fuse 2D images and 3D point cloud information during the training phase. A multi-to-single-modality knowledge distillation module is then applied to distill the fused information into a pure 2D network for water segmentation. Extensive experiments were conducted with a dataset containing various scenes collected by USVs in the water. The results demonstrate that the proposed method achieves approximately 1.5% improvement both in accuracy and MaxF over classical image-based methods, and it is much faster than the fusion-based method, achieving speeds ranging from 15 fps to 110 fps.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.