Abstract

ClusterLogs is a framework for the automatic categorization of computing jobs and resources by error messages in distributed computing systems. Initially, it was developed for high-energy physics experiments, but it can be applied in other areas. The first prototype of the framework was limited to sequential execution and did not allow the processing of a large amount of data in an acceptable time. In the next prototype, the system was significantly improved by the parallelization of several data preprocessing stages. In this paper, we focus on the parallelization of the DBSCAN algorithm, the main method used for clustering of the numeric vectors representing the error messages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call