Abstract

The process of marking multi-attribute experimental data for subsequent use by means of data mining in problems of detection and classification of rare anomalous events of computer systems (CS) is considered. The labeling process is carried out using three methods: manual preprocessing, statistical analysis and cluster analysis. Among the attributes of the metric type, the authors identified two macrogroups: “integral attributes” and “impulse attributes”. It is shown that the combination of statistical and cluster analysis methods increases the accuracy of detecting anomalous events in the CS, and also allows the selection of attributes according to their information significance. The expediency of manual preprocessing of data before clustering is shown by the example of dividing attributes into macrogroups, analyzing the density distribution using violin plot and removing the trend component using the method difference stationary series. With the help of construction of violin diagrams (Violin plot) for the attribute of the “integral” macrogroup, the distribution of states of the CS is shown. It is shown that the removal of the trend component by the DS-series method, normalization and reduction to absolute values allows more accurate marking of anomalous outliers, but this is not always acceptable. The interpretation of the clustering results performed for each normalized attribute shows that the normal values for all attributes are concentrated around zero values. The result of labeling experimental data is attribute-labeled data, where each attribute at the current time is assigned one of two states: abnormal or normal.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call