Abstract

Recent advancements in deep learning (DL) have led to the wide adoption of AI applications, such as image recognition [1], image de-noising and speech recognition, in the 5G smartphones. For a satisfactory user experience, there are stringent requirements in the real-time response of smartphone applications. In order to meet the performance expectations for DL, numerous deep learning accelerators (DLA) have been proposed for DL inference on the edge devices [2]–[5]. As depicted in Fig. 7.1.1, the major challenge in designing a DLA for smartphones is achieving the required computing efficiency, while limited by the power budget and memory bandwidth (BW). Since the overall power consumption of a smartphone system-on-a-chip (SoC) is usually constrained to 2 to 3W and the available DRAM BW is around 10-to-30GB/s, the power budget allocated for a DLA must be below 1W with the memory BW limited to 1-to-10GB/s. While operating under such constraints, the DLA is required to support various network topologies and highly precise neural operations in smartphone applications. For instance, the Android neural network APIs currently specify the use of asymmetric quantization (ASYMM-Q), providing better precision than conventional symmetric quantization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call