Abstract
Fast stereo based 3D object detectors have made great progress recently. However, they suffer from the inferior accuracy. We argue that the main reason is due to the poor geometry-aware feature representation in 3D space. To solve this problem, we propose an efficient stereo geometry network (ESGN). The key in our ESGN is an efficient geometry-aware feature generation (EGFG) module. Our EGFG module first uses a stereo correlation and reprojection module to construct multi-scale stereo volumes in camera frustum space, second employs a multi-scale bird’s eye view (BEV) projection and fusion module to generate multiple geometry-aware features. In these two steps, we adopt deep multi-scale information fusion for discriminative geometry-aware feature generation, without any complex aggregation networks. In addition, we introduce a deep geometry-aware feature distillation scheme to guide stereo feature learning with a LiDAR-based detector. The experiments are performed on the classical KITTI dataset. On KITTI test set, our ESGN outperforms the fast state-of-art-art detector YOLOStereo3D by 5.14% on mAP<sub>3<i>d</i></sub> at 62<i>ms</i>. To the best of our knowledge, our ESGN achieves a best trade-off between accuracy and speed. We hope that our efficient stereo geometry network can provide more possible directions for fast 3D object detection.
Accepted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have