VenusAI: An artificial intelligence platform for scientific discovery on supercomputers

Tiechui Yao,Jue Wang,Meng Wan,Zhikuang Xin,Yangang Wang,Rongqiang Cao,Shigang Li,Xuebin Chi

doi:10.1016/j.sysarc.2022.102550

Abstract

Since the machine learning platform can provide one-stop artificial intelligence (AI) application solutions, it has been widely used in the industrial and commercial internet fields in recent years. Based on the heterogeneous accelerator cards, scientific discovery using large-scale computation and massive data is a significant tendency in the future. However, building a platform for scientific discovery remains challenging, including large-scale heterogeneous resource scheduling and support for massive multi-source data. To free researchers from tedious resource management and environmental configuration, we propose a VenusAI platform for large-scale computing scenarios in scientific research, based on heterogeneous resources scheduling framework. This paper firstly illustrates the VenusAI platform architecture design scheme based on the supercomputers and elaborates on the virtualization and containerization of the underlying hardware resources. Next, a technical framework for heterogeneous resource aggregation and scheduling is proposed. A unified resource interface in the application service layer is introduced. Considering the core three parts of the AI scenario: data, model, and computing power, modularized service decoupling is carried out. Furthermore, three types of experiments are evaluated on the supercomputers and show that the performance of the scheduling framework on virtual clusters is better than that on common clusters. Finally, three scientific discovery applications deployed on VenusAI, i.e., new energy forecasting, materials design, and unmanned aerial vehicle planning, demonstrate the advantages of the platform in solving practical scientific problems.

Full Text