Abstract
The unsupervised anomaly detection task based on high-dimensional or multidimensional data occupies a very important position in the field of machine learning and industrial applications; especially in the aspect of network security, the anomaly detection of network data is particularly important. The key to anomaly detection is density estimation. Although the methods of dimension reduction and density estimation have made great progress in recent years, most dimension reduction methods are difficult to retain the key information of original data or multidimensional data. Recent studies have shown that the deep autoencoder (DAE) can solve this problem well. In order to improve the performance of unsupervised anomaly detection, we propose an anomaly detection scheme based on a deep autoencoder (DAE) and clustering methods. The deep autoencoder is trained to learn the compressed representation of the input data and then feed it to clustering approach. This scheme makes full use of the advantages of the deep autoencoder (DAE) to generate low-dimensional representation and reconstruction errors for the input high-dimensional or multidimensional data and uses them to reconstruct the input samples. The proposed scheme could eliminate redundant information contained in the data, improve performance of clustering methods in identifying abnormal samples, and reduce the amount of calculation. To verify the effectiveness of the proposed scheme, massive comparison experiments have been conducted with traditional dimension reduction algorithms and clustering methods. The results of experiments demonstrate that, in most cases, the proposed scheme outperforms the traditional dimension reduction algorithms with different clustering methods.
Highlights
Detection is a very important branch of machine learning, with a wide range of practical applications, and it aims to detect special points in data. It is suitable for fault diagnosis [1, 2], system health monitoring [3], network security detection [4], intrusion and fraud detection [5,6,7], measurement, and other fields. e exceptions to the normal instances are called anomalies, so anomalies are called exceptions, outliers, novelties, noises, and deviations [8]. e so-called anomaly detection is to find objects that are different from most objects. e three objects O1, O2, and O3 in Figure 1 are different from most of the objects in N1 and N2 classes. e deviation is different for different applications
Some traditional dimension reduction methods, like Linear Discriminant Analysis (LDA), least absolute shrinkage and selection operator (LASSO), Locally Linear Embedding (LLE), Principal Component Analysis (PCA), Independent Principal Component Analysis (ICA), and Multidimensional Scale Transformation (MDS), are employed to process data, but, in the process of dimension reduction, some key information of the original data will be lost, which reduces the difference between normal samples and abnormal samples
According to the above analysis, we propose an anomaly detection scheme based on deep autoencoder. e following contributions are made to the unsupervised anomaly detection of high-dimensional data: (i) A dimension reduction method based on deep autoencoder and reconstruction of input samples is proposed. e deep autoencoder is used to reduce the dimension of the data, and the combination of the dimension reduction result and the reconstruction error forms a low-dimensional reconstruction input sample. e key information of the data is well preserved in the low-dimensional reconstruction input samples, which makes it easier to identify abnormal samples
Summary
Detection is a very important branch of machine learning, with a wide range of practical applications, and it aims to detect special points in data. It is suitable for fault diagnosis [1, 2], system health monitoring [3], network security detection [4], intrusion and fraud detection [5,6,7], measurement, and other fields. When deep neural networks have achieved good results in other fields, the dimensional disaster of data in anomaly detection seems to come to a turning point. The deep autoencoding Gaussian mixture model [10] has shown good performance on public datasets, providing a new direction for high-dimensional data anomaly detection
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.