Abstract
In recent years, with the development of intelligent video surveillance, robot navigating and automatic driving, 3D object detection is becoming an important field of computer vision. Most methods are based on deep neural networks and use information obtained with sensors such as LiDAR and binocular or monocular cameras. Methods based on binocular images are relatively cheap and less independent on computation device compared to point cloud data from LiDAR, and provide depth information that monocular image does not contain. In this sense, a lightweight detection approach based on stereo images has great practical significance in some scenarios with certain requirements.Based on discussion above, we study on lightweight improvements based on the latest YOLOStereo3D model, and weigh the performance of different lightweight backbones on this task. At the same time, we propose two improvements on the depth feature extraction structure, aiming to reduce the accuracy loss brought with lightweight backbone. Finally, we obtain a relatively lightweight detection model that can maintain sufficient accuracy for certain scenarios.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have