IMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures

Chenyang Zhang,Xiao Zhang,Xiaoyong Du,Bingsheng He,Feng Zhang,Xiaoguang Guo

doi:10.1109/tpds.2020.3046870

Chenyang Zhang, Xiao Zhang + Show 4 more

https://doi.org/10.1109/tpds.2020.3046870

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Utilizing heterogeneous accelerators, especially GPUs, to accelerate machine learning tasks has shown to be a great success in recent years. GPUs bring huge performance improvements to machine learning and greatly promote the widespread adoption of machine learning. However, the discrete CPU-GPU architecture design with high PCIe transmission overhead decreases the GPU computing benefits in machine learning training tasks. To overcome such limitations, hardware vendors release CPU-GPU integrated architectures with shared unified memory. In this article, we design a benchmark suite for machine learning training on CPU-GPU integrated architectures, called iMLBench, covering a wide range of machine learning applications and kernels. We mainly explore two features on integrated architectures: 1) zero-copy, which means that the PCIe overhead has been eliminated for machine learning tasks and 2) co-running, which means that the CPU and the GPU co-run together to process a single machine learning task. Our experimental results on iMLBench show that the integrated architecture brings an average 7.1× performance improvement over the original implementations. Specifically, the zero-copy design brings 4.65× performance improvement, and co-running brings 1.78× improvement. Moreover, integrated architectures exhibit promising results from both performance-per-dollar and energy perspectives, achieving 6.50× performance-price ratio while 4.06× energy efficiency over discrete GPUs. The benchmark is open-sourced at https://github.com/ChenyangZhang-cs/iMLBench.

Full Text