A 40nm 2TOPS/W Depth-Completion Neural Network Accelerator SoC With Efficient Depth Engine for Realtime LiDAR Systems

Miao Sun,Jian Qian,Sifan Zhou,Ziyu Zhao,Shunli Ma,Shenglong Zhuo,Patrick Yin Chiang,Jie Li,Yajie Qin,Tao Xia,Lei Qiu,Yingjie Cao,Yifan Wu

doi:10.1109/tcsii.2023.3260116

Abstract

Light Detection and Ranging (LiDAR) is becoming a critical requirement for future computer vision applications, such as AR/VR (iPhone-LiDAR) and ADAS (Automotive-LiDAR). A depth point-cloud input has different characteristics than a conventional RGB image input, such that the CNN depthinference implementation is unique when compared with a standard super-resolution CNN(SR-CNN). In this brief, we present a heterogeneous AI-accelerator SoC, which is specific to depth image completion computation. Three key innovations are introduced to improve SoC’s performance. First, to accommodate the unique input data structure of a depth input, a fully-filled dataflow management engine is proposed to pre-process the RGB+Depth input, significantly improving processing element utilization (PEU). Second, to improve the efficiency of the instruction configurations of the CNN accelerator, a hardware-tiling co-processor is proposed that performs the tiling strategy of the CNN accelerator, assigning each sub-job to the PE array directly, therefore reducing the time for task assignments. Third, due to the large number of vector operations required for the postprocess in the neural network, a RISC-V core is incorporated to execute vector computations better. The SoC is implemented in 40nm CMOS process, achieving 2TOPs/W energy efficiency with 34fps throughput under VGA-resolution output for realtime LiDAR systems.

Full Text