SWIRL: High-performance many-core CPU code generation for deep neural networks

Anand Venkat,Tharindu Rusira,Raj Barik,Mary Hall,Leonard Truong

doi:10.1177/1094342019866247

Abstract

Deep neural networks (DNNs) have demonstrated effectiveness in many domains including object recognition, speech recognition, natural language processing, and health care. Typically, the computations involved in DNN training and inferencing are time consuming and require efficient implementations. Existing frameworks such as TensorFlow, Theano, Torch, Cognitive Tool Kit (CNTK), and Caffe enable Graphics Processing Unit (GPUs) as the status quo devices for DNN execution, leaving Central Processing Unit (CPUs) behind. Moreover, existing frameworks forgo or limit cross layer optimization opportunities that have the potential to improve performance by significantly reducing data movement through the memory hierarchy. In this article, we describe an alternative approach called SWIRL, a compiler that provides high-performance CPU implementations for DNNs. SWIRL is built on top of the existing domain-specific language (DSL) for DNNs called LATTE. SWIRL separates DNN specification and its schedule using predefined transformation recipes for tensors and layers commonly found in DNN layers. These recipes synergize with DSL constructs to generate high-quality fused, vectorized, and parallelized code for CPUs. On an Intel Xeon Platinum 8180M CPU, SWIRL achieves performance comparable with Tensorflow integrated with MKL-DNN; on average 1.00× of Tensorflow inference and 0.99× of Tensorflow training. It also outperforms the original LATTE compiler on average by 1.22× and 1.30× on inference and training, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SWIRL: High-performance many-core CPU code generation for deep neural networks

Abstract

Talk to us

Similar Papers

More From: The International Journal of High Performance Computing Applications

Lead the way for us

Journal: The International Journal of High Performance Computing Applications	Publication Date: Aug 4, 2019
Citations: 21

Similar Papers

AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters
Nawras Alnaasan ... Hari Subramoni
-
Nawras Alnaasan, et. al.Nawras Alnaasan ... Hari Subramoni
01 Dec 2022
01 Dec 2022

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding
Sai Qian Zhang ... H T Kung
-
Sai Qian Zhang, et. al.Sai Qian Zhang ... H T Kung
01 Apr 2022
01 Apr 2022

Dynamic Memory Management for GPU-Based Training of Deep Neural Networks
Shriram S.B ... Anshuj Garg
-
Shriram S.B, et. al.Shriram S.B ... Anshuj Garg
01 May 2019
01 May 2019

Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges
Edgar Galvan ... Peter Mooney
IEEE Transactions on Artificial Intelligence | VOL. 2
Edgar Galvan, et. al.Edgar Galvan ... Peter Mooney
04 May 2021
IEEE Transactions on Artificial Intelligence | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SWIRL: High-performance many-core CPU code generation for deep neural networks

Abstract

Talk to us

Similar Papers

More From: The International Journal of High Performance Computing Applications