Abstract
A Grid computing site is composed of various services including Grid middleware, such as Computing Element and Storage Element. Text logs produced by the services provide useful information for understanding the status of the services. However, it is a time-consuming task for site administrators to monitor and analyze the service logs every day. Therefore, a support framework has been developed to ease the site administrator’s work. The framework detects anomaly logs using Machine Learning techniques and alerts site administrators. The framework has been examined using real service logs at the Tokyo Tier2 site, which is one of the Worldwide LHC Computing Grid sites. In this paper, a method of the anomaly detection in the framework and its performances at the Tokyo Tier2 site are reported.
Highlights
A Grid computing site is composed of various services including Grid middleware, such as Computing Element (CE) and Storage Element (SE)
We have introduced a word embedding technique and a clustering algorithm, which are both unsupervised Machine Learning (ML), to detect anomaly logs
If the sample for Aug 29 is considered as a true positive because the anomaly detection of the sample is reasonable, the framework shows a better performance with the F1 score = 0.86
Summary
A Grid computing site is composed of various services including Grid middleware, such as Computing Element (CE) and Storage Element (SE). The framework is designed to detect anomaly logs, which could indicate a problem of the service, using Machine Learning (ML) techniques and alert the site administrators. There are several reports on the anomaly detection using ML techniques for Grid computing site operation in the high energy physics community. The framework has been examined using real service logs at a Grid computing site in the University of Tokyo, which is one of the Worldwide LHC. A method of the detection of anomaly logs in the framework and its performances at the Grid computing site are reported.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have