Abstract
In this paper, an end-to-end learning framework—RGB enhanced point cloud fusion network (REPF-Net) is designed based on the complementary and enhanced fusion of multi-sensor (camera and lidar) data, aiming to realize the accurate perception of the robot in a random complex environment. In other words, the three-dimensional position and geometric shape of environmental targets can be obtained excellently and the accuracy of target recognition can be improved. Meanwhile, three key problems in 3D perception tasks were solved including, enhancing the fusion point cloud with RGB feature points, achieving effective fusion of heterogeneous data, and enhancing the skeleton information of the target point set. Point-by-point channel attention network (PCA-Net) is proposed to obtain the deep image features. The loss of attention consistent execution (ACE) loss function is proposed to overcome the inconsistencies of localization and classification confidence. Numerous experiments on the KITTI dataset prove that the proposed REPF-Net is superior to the currently common advanced 3D sensing algorithms.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have