Anomaly detection is an important area of application of artificial intelligence in various areas of large data analysis, such as computer system security, fraud detection in bank transfers, reliability of computer vision systems and others. The detection of anomalies is also a key task of the analysis of biomedical information, since the violation of the stability of the recognition systems of dangerous diseases based on the analysis of biomedical signals and MRI, CT images, for example, can lead to erroneous diagnosis of patients. One of the main problems in machine learning and data analysis tasks is their correct labeling. In the task of detecting anomalies, its implementation is almost impossible due to both the unpredictability and the variety of their occurrence. Therefore, one of the actual approaches to solving the problem is the use of unsupervised machine learning methods, since in this case preliminary labeling of the data into abnormal and normal data is not required. There are popular methods for solving the problem of anomaly detection, which include the isolated forest algorithm, methods of nonparametric statistics, cluster analysis, and others. However, at the present stage of development of data analysis methods, machine learning and deep learning methods are becoming more and more effective. In this paper, a generative machine learning approach is proposed for anomalies detection. For this purpose, models of autoencoders have been developed, which are representatives of unsupervised deep learning methods. The autoencoder model consists of an encoder, a hidden layer of input data representation (latent representation), and a decoder. High-dimensional input data are transformed by the encoder into hidden representations of low-dimensional source data. The dimension of the hidden representations is smaller than the incoming source data. The task of the decoder is to recover the input data. The autoencoder accepts high-dimensional input data, compresses it to a representation in the space of a hidden layer. The decoder then takes the hidden representation of the data as input to restore the original input data. At the output, the autoencoder represents the recovered image or signal. Computational experiments were carried out to test the proposed method for detecting anomalies on a set of electrocardiograms of patients with various heart diseases. The data set under study was created and balanced in such a way that it represents 5000 electrocardiogram records, of which the proportion of normal signals is 58 %, the proportion of abnormal signals is 42 %. Each line corresponds to one complete ECG record of the patient. To detect abnormal ECG signals an autoencoder model based on deep neural networks is proposed. The autoencoder model is implemented in the Python programming language using the Keras framework [10]. The encoder consists of 5 fully connected layers Dense(128), Dense(64), Dense(32), Dense(16), Dense(8) with the activation function ReLU each. The decoder consists of five fully connected layers of Dense(8), Dense(16), Dense(32), Dense(64), Dense(128) with ReLU activation function and one fully connected layer of Dense(140) with sigmoid activation function ‘sigmoid'. The loss function during signal reconstruction is given by the RMS error between the original image and the image processed by the neural network. The Adam optimization method, the MAE loss function were used during training, the learning rate was 1E-04. A total of 500 epochs of model training were conducted, the parameter batch_size=32. To compare the results obtained in the work with other methods, such popular machine learning methods as SVM, logistic regression and LGBM were used. For the LGBM method, the anomaly detection accuracy was 81.4 %, for SVM – 78.47 %, which allowed us to assert the advantages of the proposed autoencoder model.
Read full abstract