Abstract

Nvidia is a leading producer of GPUs for high-performance computing and artificial intelligence, bringing top performance and energy-efficiency. We present performance, power consumption, and thermal behavior analysis of the new Nvidia DGX-A100 server equipped with eight A100 Ampere microarchitecture GPUs. The results are compared against the previous generation of the server, Nvidia DGX-2, based on Tesla V100 GPUs. We developed a synthetic benchmark to measure the raw performance of floating-point computing units including Tensor Cores. Furthermore, thermal stability was investigated. In addition, Dynamic Frequency and Voltage Scaling (DVFS) analysis was performed to determine the best energy-efficient configuration of the GPUs executing workloads of various arithmetical intensities. Under the energy-optimal configuration the A100 GPU reaches efficiency of 51 GFLOPS/W for double-precision workload and 91 GFLOPS/W for tensor core double precision workload, which makes the A100 the most energy-efficient server accelerator for scientific simulations in the market.

Highlights

  • The modern High-Performance Computing (HPC) servers more and more often accommodate a heterogeneous hardware, which brings high computational performance hand in hand with high power efficiency in comparison to general purpose server processors [1].Especially accelerators are considered to be a hardware platform that will enable a construction of the future exascale systems with reasonable power consumption, since their energy-efficiency is much higher when compared to general purpose server processors.The most common piece of such hardware in the list of the most powerful supercomputers nowadays are Nvidia GPUs

  • Accelerators are considered to be a hardware platform that will enable a construction of the future exascale systems with reasonable power consumption, since their energy-efficiency is much higher when compared to general purpose server processors

  • We added the analysis of DGX-A100, and more importantly, we provide a comparison of these two flagship General Purpose GPUSs (GPGPU) servers both in terms of performance and power consumption

Read more

Summary

Introduction

The modern High-Performance Computing (HPC) servers more and more often accommodate a heterogeneous hardware, which brings high computational performance hand in hand with high power efficiency in comparison to general purpose server processors [1].Especially accelerators are considered to be a hardware platform that will enable a construction of the future exascale systems with reasonable power consumption, since their energy-efficiency is much higher when compared to general purpose server processors.The most common piece of such hardware in the list of the most powerful supercomputers nowadays are Nvidia GPUs. The modern High-Performance Computing (HPC) servers more and more often accommodate a heterogeneous hardware, which brings high computational performance hand in hand with high power efficiency in comparison to general purpose server processors [1]. Accelerators are considered to be a hardware platform that will enable a construction of the future exascale systems with reasonable power consumption, since their energy-efficiency is much higher when compared to general purpose server processors. Besides the GPU development, in 2016, Nvidia came with the first generation of their server system called DGX-1 [3] based on their top server General Purpose GPUSs (GPGPU). We compare the second and the third generation of this server—DGX-2 [4] and DGX-A100 [5]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call