Memory streaming acceleration for embedded systems with CPU-accelerator cooperative data processing

Kwangho Lee,Joonho Kong,Young Geun Kim,Sung Woo Chung

doi:10.1016/j.micpro.2019.102897

Abstract

Memory streaming operations (i.e., memory-to-memory data transfer with or without simple arithmetic/logical operations) are one of the most important tasks in general embedded/mobile computer systems. In this paper, we propose a technique to accelerate memory streaming operations. The conventional way to accelerate memory streaming operations is employing direct memory access (DMA) with dedicated hardware accelerators for simple arithmetic/logical operations. In our technique, we utilize not only a hardware accelerator with DMA but also a central processing unit (CPU) to perform memory streaming operations, which improves the performance and energy efficiency of the system. We also implemented our prototype in a field-programmable gate array system-on-chip (FPGA-SoC) platform and evaluated our technique in real measurement from our prototype. From our experimental results, our technique improves memory streaming performance by 34.1–73.1% while reducing energy consumption by 29.0–45.5%. When we apply our technique to various real-world applications such as image processing, 1 × 1 convolution operations, and bias addition/scale, performances are improved by 1.1 × –2.4 × . In addition, our technique reduces energy consumptions when performing image processing, 1 × 1 convolution, and bias addition/scale by 7.9–17.7%, 46.8–57.7%, and 41.7–58.5%, respectively.

Full Text