AbstractLight detection and ranging (LiDAR) and stereo cameras are two generally used solutions for perceiving 3D information. The complementary properties of these two sensor modalities motivate a fusion to derive practicable depth sensing toward real‐world applications. Promoted by deep neural network (DNN) techniques, recent works achieve superior performance on accuracy. However, the complex architecture and the sheer number of DNN parameters often lead to poor generalization capacity and non‐real‐time computing. In this paper, we present FastFusion, a three‐stage stereo‐LiDAR deep fusion scheme, which integrates the LiDAR priors into each step of classical stereo‐matching taxonomy, gaining high‐precision dense depth sensing in a real‐time manner. We integrate stereo‐LiDAR information by taking advantage of a compact binary neural network and utilize the proposed cross‐based LiDAR trust aggregation to further fuse the sparse LiDAR measurements in the back‐end of stereo matching. To align the photometrical of the input image and the depth of the estimation, we introduce a refinement network to guarantee consistency. More importantly, we present a graphic processing unit‐based acceleration framework for providing a low‐latency implementation of FastFusion, gaining both accuracy improvement and real‐time responsiveness. In the experiments, we demonstrate the effectiveness and practicability of FastFusion, which obtains a significant speedup over state‐of‐the‐art baselines while achieving comparable accuracy on depth sensing. The video demo for real‐time depth estimation of FastFusion on the real‐world driving scenario is available at https://youtu.be/nP7cls2BA8s.
Read full abstract