Abstract

Deep learning on point clouds plays a vital role in a wide range of applications such as autonomous driving and AR/VR. These applications interact with people in real-time on edge devices and thus require low latency and low energy. Compared to projecting the point cloud to 2D space, directly processing the 3D point cloud yields higher accuracy and lower #MACs. However, the extremely sparse nature of point cloud poses challenges to hardware acceleration. For example, we need to explicitly determine the nonzero outputs and search for the nonzero neighbors (mapping operation), which is unsupported in existing accelerators. Furthermore, explicit gather and scatter of sparse features are required, resulting in large data movement overhead. In this paper, we comprehensively analyze the performance bottleneck of modern point cloud networks on CPU/GPU/TPU. To address the challenges, we then present PointAcc, a novel point cloud deep learning accelerator. PointAcc maps diverse mapping operations onto one versatile ranking-based kernel, streams the sparse computation with configurable caching, and temporally fuses consecutive dense layers to reduce the memory footprint. Evaluated on 8 point cloud models across 4 applications, PointAcc achieves 3.7X speedup and 22X energy savings over RTX 2080Ti GPU. Co-designed with light-weight neural networks, PointAcc rivals the prior accelerator Mesorasi by 100X speedup with 9.1% higher accuracy running segmentation on the S3DIS dataset. PointAcc paves the way for efficient point cloud recognition.

Highlights

  • A point cloud is a collection of points that represent a physical object or 3D scene

  • Similar to image convolution which works on the receptive field (Figure 3a), point cloud convolution is conducted on the neighborhood of the output point (Figure 3c)

  • PointNet++-based convolution applies farthest point sampling during the downsampling, where each output point is sampled from the input point cloud I one by one iteratively

Read more

Summary

INTRODUCTION

A point cloud is a collection of points that represent a physical object or 3D scene. State-of-the-art point cloud networks [9, 35] use different weights for different neighbors, offering much higher accuracy To tackle such dilemma, we present PointAcc, an efficient domain-specific accelerator for point cloud deep learning. Point cloud processing requires a variety of mapping operations, such as ball query and kernel mapping, to establish the relationship between input and output points for computation, which has not been explored by existing deep learning accelerators. To tackle this challenge, PointAcc unifies these operations in a ranking-based computation paradigm which can generalize to other similar operations. Co-designing the neural network, PointAcc outperforms the prior state-of-the-art point cloud accelerator Mesorasi by 100× speedup and 9.1% better mIoU accuracy running segmentation on the S3DIS dataset

BACKGROUND
F2 F3 F4
Mapping Operations
MatMul Operations
MOTIVATION
ARCHITECTURE
Mapping Unit
Memory Management Unit
Matrix Unit
Evaluation Setup
Evaluation Results
RELATED WORK
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call