Abstract

Deep learning has achieved competing results compared with human beings in many fields. Traditionally, deep learning networks are executed on CPUs and GPUs. In recent years, more and more neural network accelerators have been introduced in both academia and industry to improve the performance and energy efficiency for deep learning networks. In this paper, we introduce a flexible and configurable functional NN accelerator simulator, which could be configured to simulate u-architectures for different NN accelerators. The extensible and configurable simulator is helpful for system-level exploration of u-architecture, as well as operator optimization algorithm developments. The simulator is a functional simulator that simulates the latencies of calculation and memory access and the concurrent process between modules, and it gives the number of program execution cycles after the simulation is completed. We also integrated the simulator into the TVM compilation stack as an optional backend. Users can use TVM to write operators and execute them on the simulator.

Highlights

  • Deep learning has been applied to image recognition, object detection, speech recognition, and other fields

  • Users can further assemble new neural networks (NNs) layers from those basic computations, which makes Cambricon instruction set architecture (ISA) more flexible than its predecessors. ey implemented a prototype accelerator of Cambricon ISA, which achieved the same level of performance as DaDianNao in the experiments

  • (b) e simulator is a functional simulator that simulates the latencies of calculation and memory access and the concurrent process between modules, and it gives the number of program execution cycles after the simulation is completed

Read more

Summary

Introduction

Deep learning has been applied to image recognition, object detection, speech recognition, and other fields. CPUs and GPUs are widely used to execute neural networks (NNs), but more and more hardware accelerators have been introduced to improve the performance and energy efficiency of NN computing. Wireless Communications and Mobile Computing reconfigurable DNN processor for IOT devices that uses binary/ternary weights to do calculations, and it applies three techniques to improve energy efficiency and achieves 19.9TOPS/W power efficiency at a power consumption of 10 mW. Ey improve energy efficiency by reducing data movement between memory and processing units, and the latter 3 use analog arithmetic for matrix calculations. TVM [27] is a deep learning compiler stack, and it provides both graph-level and operator-level optimizations and can target different backends including CPU, GPU, and hardware accelerators.

Accelerator Architecture and ISA
Codegen System
Experiments
Conclusions
Disclosure
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.