Oak Ridge National Laboratory (ORNL) installed the Summit supercomputer in 2018. Summit is an accelerated-node architecture with 4,608 nodes, each with two IBM P9 and six NVIDIA Volta V100 GPU processors, significant DRAM footprint, robust HBM quantities supporting the GPUs, nonvolatile memory, and fast NVLink and Infiniband interconnects. This machine was designed to deliver over 200 peak double-precision petaflops for scientific modeling and simulation applications and over 3 peak reduced-precision ExaOps. Summit features impact application performance depending on whether the codes are simulation-oriented, write-intensive, data-analysis-oriented, read-intensive, or communication-intensive codes. In the context of artificial intelligence (AI) and machine learning (ML), these features support data-intensive applications that infer and predict statistical relationships in complex datasets. This article presents recent experiences at ORNL using Summit for applications in AI and ML and describes example code and algorithmic changes necessary to use Summit effectively. Finally, this article discusses research directions in scalable ML, including, algorithms research and combining data analysis with modeling and simulation in an accelerated-node, exascale environment.
Read full abstract