Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Nitin A Gawande,Jeff A Daily,Charles Siegel,Nathan R Tallent,Abhinav Vishnu

doi:10.1016/j.future.2018.04.073

Nitin A Gawande, Jeff A Daily + Show 3 more

Open Access

https://doi.org/10.1016/j.future.2018.04.073

Copy DOI

Abstract

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors – including NVIDIA, Intel, AMD, and IBM – have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products.This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling – sometimes encouraged by restricted GPU memory – NVLink is less important.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Generation Computer Systems	Publication Date: May 5, 2018
Citations: 25	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Similar Papers

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Nitin A Gawande ... Joshua B Landwehr
-
Nitin A Gawande, et. al.Nitin A Gawande ... Joshua B Landwehr
01 May 2017
01 May 2017

Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning and HPC Workloads
Evangelos Georganas ... Jeremy Bruestle
Frontiers in Applied Mathematics and Statistics | VOL. 8
Evangelos Georganas, et. al.Evangelos Georganas ... Jeremy Bruestle
18 Apr 2022
Frontiers in Applied Mathematics and Statistics | VOL. 8

Tensor processing primitives
Evangelos Georganas ... Narendra Chaudhary
-
Evangelos Georganas, et. al.Evangelos Georganas ... Narendra Chaudhary
13 Nov 2021
13 Nov 2021

Evaluation of Deep Learning Frameworks Over Different HPC Architectures
Shayan Shams ... Seung-Jong Park
-
Shayan Shams, et. al.Shayan Shams ... Seung-Jong Park
01 Jun 2017
01 Jun 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems