Abstract
Supercomputers play a very important role in science, since they are used in almost all its spheres. Medicine, chemistry, weather prediction, oil and gas exploration, astronomy – in all these and many more areas, high-performance computing is heavily used. But overall efficiency of supercomputer is usually low due to a number of applications with efficiency issues running on a supercomputer. To solve this problem, the authors are developing a tool that analyzes the flow of applications launched on a supercomputer and identifies programs with suspiciously inefficient behavior. Proposed methods take monitoring data as an input since such data provide full and detailed information about application behavior during its execution.We have already conducted a study on identifying applications with abnormal efficiency in the supercomputer job flow. This classification method based on Random Forest algorithm showed good performance results on real-life data of the Lomonosov-2 supercomputer. But this previously proposed method of analyzing the supercomputer job flow has a big drawback – it is not applicable to the analysis of running jobs.In this work we propose a new method that uses Long short-term memory (LSTM) neural network along with our previous classifier to detect running applications with suspicious behavior. In this method, LSTM detects suspicious parts of the application execution by analyzing its previous behavior during this launch, and after detection, our more precise previous classifier concludes whether the application really has a suspicious behavior. This method is currently being developed and tested on real-life applications of the Lomonosov-2 supercomputer.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have