Abstract

People in cities are suffering from traffic congestion and air pollution in daily life partly due to a great number of private cars, and always face the danger of accidents, so autonomous driving is developed by many institutes and companies especially in recent years. Autonomous driving will play an import role in the future smart cities, reduce the time and economic cost of the whole society, and be helpful for the sustainability of the city and society. A significant task for autonomous driving is to detect surrounding objects accurately in real-time, including car, pedestrian, cyclist, etc. In this paper, we propose one end-to-end three dimensional (3D) object detection method based on voxelization, sparse convolution, and feature fusion. The proposed method exploits only point cloud as input, and it has two key components—small voxels and efficient feature fusion. Instead of utilizing extra networks to transform voxels, we directly average the points within each voxel as their feature representation. To enrich features for prediction, we have designed a two-step feature fusion method called fusion of fusion network that can combine information of multiple scales and 3D space. We have submitted to the official test server of the 3D detection benchmark—KITTI, and achieved state-of-the-art performance especially in the Cyclist class. Besides, detection speed of our method achieves 0.05 s/frame with a 2–4 fold runtime improvement against state-of-the-art methods due to its simple and compact architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call