Abstract

Novelty detection is a classification problem to identify abnormal patterns; therefore, it is an important task for applications such as fraud detection, fault diagnosis and disease detection. However, when there is no label that indicates normal and abnormal data, it will need expensive domain and professional knowledge, so an unsupervised novelty detection approach will be used. On the other hand, nowadays, using novelty detection on high dimensional data is a big challenge and previous research suggests approaches based on principal component analysis (PCA) and an autoencoder in order to reduce dimensionality. In this paper, we propose deep autoencoders with density based clustering (DAE-DBC); this approach calculates compressed data and error threshold from deep autoencoder model, sending the results to a density based cluster. Points that are not involved in any groups are not considered a novelty; the grouping points will be defined as a novelty group depending on the ratio of the points exceeding the error threshold. We have conducted the experiment by substituting components to show that the components of the proposed method together are more effective. As a result of the experiment, the DAE-DBC approach is more efficient; its area under the curve (AUC) is shown to be 13.5 percent higher than state-of-the-art algorithms and other versions of the proposed method that we have demonstrated.

Highlights

  • An abnormal pattern that is not compatible with most of the data in a dataset is named a novelty, outlier, or anomaly [1]

  • The deep autoencoders with density based clustering (DAE-DBC) method is suggested and aims to increase the accuracy of unsupervised novelty detection

  • We propose the unsupervised novelty detection method over high dimensional data based on deep autoencoder and density-based clustering

Read more

Summary

Introduction

An abnormal pattern that is not compatible with most of the data in a dataset is named a novelty, outlier, or anomaly [1]. There are three basic ways to detect novelty depending on the availability of data label [1]. If the data is labeled as normal or novelty, a supervised approach can be used as a traditional classification task. In this case, training data consists of both normal and novelty data and builds a model that predicts unseen data as normal and novelty. Novelty faces with a class imbalance problem due to the relatively low comparability of normal data [11]. The second method is a semi-supervised method, which only uses normal data to build a classification model.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call