Abstract

One of the main problems of modern supercomputers is the low efficiency of their usage, which leads to the significant idle time of computational resources, and, in turn, to the decrease in speed of scientific research. This paper presents three approaches to study the efficiency of supercomputer resource usage based on monitoring data analysis. The first approach performs an analysis of computing resource utilization statistics, which allows to identify different typical classes of programs, to explore the structure of the supercomputer job flow and to track overall trends in the supercomputer behavior. The second approach is aimed specifically at analyzing off-the-shelf software packages and libraries installed on the supercomputer, since efficiency of their usage is becoming an increasingly important factor for the efficient functioning of the entire supercomputer. Within the third approach, abnormal jobs – jobs with abnormally inefficient behavior that differs significantly from the standard behavior of the overall supercomputer job flow – are being detected. For each approach, the results obtained in practice in the Supercomputer Center of Moscow State University are demonstrated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call