Electra: A Modular-Based Expansion of NASA’s Supercomputing Capability

Rupak Biswas,William Thigpen,Piyush Mehrotra,Jeff Becker,Michelle Moyer,Chris Tanner,Robert Hood,Davin Chan,David Ellsworth

doi:10.1201/9781351036863-13

Abstract

NASA has increasingly relied on high-performance computing (HPC) re- sources for computational modeling, simulation, and data analysis to meet the science and engineering goals of its missions in space exploration, aeronautics, and Earth and space science. The NASA Advanced Supercomputing (NAS) Division at Ames Research Center in Silicon Valley, Calif., hosts NASA’s premier supercomputing resources, integral to achieving and enhancing the success of the agency’s missions. NAS provides a balanced environment, funded under the High-End Computing Capability (HECC) project, comprised of world-class supercomputers, including its flagship distributed-memory cluster, Pleiades; high-speed networking; and massive data storage facilities, along with multi-disciplinary support teams for user support, code porting and optimization, and large-scale data analysis and scientific visualization. However, as scientists have increased the fidelity of their simulations and engineers are conducting larger parameter-space studies, the requirements for supercomputing resources have been growing by leaps and bounds. With the facility housing the HECC systems reaching its power and cooling capacity, NAS undertook a prototype project to investigate an alternative approach for housing supercomputers. Modular supercomputing, or container-based computing, is an innovative concept for expanding NASA’s HPC capabilities. With modular supercomputing, additional containers—similar to portable storage pods—can be connected together as needed to accommodate the agency’s ever-increasing demand for computing resources. In addition, taking advantage of the local weather permits the use of cooling technologies that would additionally save energy and reduce annual water usage. The first stage of NASA’s Modular Supercomputing Facility (MSF) prototype, which resulted in a 1,000 square-foot module on a concrete pad with room for 16 compute racks, was completed in Fall 2016 and an SGI (now HPE) computer system, named Electra, was deployed there in early 2017. Cooling is performed via an evaporative system built into the module, and preliminary experience shows a Power Usage Effectiveness (PUE) measurement of 1.03. Electra achieved over a petaflop on the LINPACK benchmark, sufficient to rank number 96 on the November 2016 TOP500 list [14]. The system consists of 1,152 InfiniBand-connected Intel Xeon Broadwell-based nodes. Its users access their files on a facility-wide file system shared by all HECC compute assets via Mellanox MetroX InfiniBand extenders, which connect the Electra fabric to Lustre routers in the primary facility over fiber-optic links about 900 feet long. The MSF prototype has exceeded expectations and is serving as a blueprint for future expansions. In the remainder of this chapter, we detail how modular data center technology can be used to expand an existing compute resource. We begin by describing NASA’s requirements for supercomputing and how resources were provided prior to the integration of the Electra module-based system.

Full Text