3D point cloud data, which are produced by various 3D sensors such as LIDAR and stereo cameras, have been widely deployed by industry leaders such as Google, Uber, Tesla, and Mobileye, for mobile robotic applications such as autonomous driving and humanoid robots. Point cloud data, which are composed of reliable depth information, can provide accurate location and shape characteristics for scene understanding, such as object recognition and semantic segmentation. However, deep neural networks (DNNs), which directly consume point cloud data, are particularly computation-intensive because they have to not only perform multiplication-and-accumulation (MAC) operations but also search neighbors from the irregular 3D point cloud data. Such a task goes beyond the capabilities of general-purpose processors in real-time to figure out the solution as the scales of both point cloud data and DNNs increase from application to application. We present the first accelerator architecture that dynamically configures the hardware on-the-fly to match the computation of both neighbor point search and MAC computation for point-based DNNs. To facilitate the process of neighbor point search and reduce the computation costs, a grid-based algorithm is introduced to search neighbor points from a local region of grids. Evaluation results based on the scene recognition and segmentation tasks show that the proposed design harvests 16.4$\times$ higher performance and saves 99.95% of energy than an NVIDIA Tesla K40 GPU baseline in point cloud scene understanding applications.
Read full abstract