Abstract

A recent trend is to combine multiple sensors ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , cameras, LiDARs and millimeter-wave Radars) to achieve robust multi-modal perception for autonomous systems such as self-driving vehicles. Although quite a few sensor fusion algorithms have been proposed, some of which are top-ranked on various leaderboards, a systematic study on how to integrate these three types of sensors to develop effective multi-modal 3D object detection and tracking is still missing. Towards this end, we first study the strengths and weaknesses of each data modality carefully, and then compare several different fusion strategies to maximize their utility. Finally, based upon the lessons learnt, we propose a simple yet effective multi-modal 3D object detection and tracking framework (namely EZFusion). As demonstrated by extensive experiments on the nuScenes dataset, without fancy network modules, our proposed EZFusion makes remarkable improvements over the LiDAR-only baseline, and achieves comparable performance with the state-of-the-art fusion-based methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call