Abstract

High performance computing (HPC) systems play a crucial role in performing large-scale scientific applications and their efficiencies are imperative to be improved. This paper aims to comprehensively understand job characteristics and the factors that affect system efficiency and performance, which lays a solid foundation for proposing and evaluating job scheduling and resource management methods. To achieve this goal, we collect job data covering two years from a petascale HPC system that is dedicated to computational fluid dynamics (CFD) applications. Furthermore, a detailed analysis about failed jobs and waiting time is conducted based on the dataset. Our analysis excavates some important characteristics of submitted jobs, which can not only help system owners understand and master the situation about CFD applications in the system, but also provide good guidance and ideas for optimizing job scheduling and resource management algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call