Abstract

Neighborhood-based unsupervised approaches like LDOF, LOF and symmetric neighborhood (INFLO) have proven effective over decades. These techniques principally utilize the information of either the [Formula: see text]-nearest neighbor or the reverse [Formula: see text]-nearest neighbors to detect the outlierness of each object in a data set. However, these methodologies fail to detect genuine outliers in heterogeneous data sets located between two dense clusters, between dense and sparse clusters or the scattered data sets. In addition, LOF treats a normal point of a sparse cluster as an outlier if the sparse cluster is close to a dense cluster. This paper proposes a novel autoencoder deep learning architecture to overcome the limitations of the aforementioned techniques. In the proposed approach, we identify the potential outliers intelligently from a given data set and mark them to generate training samples for the autoencoder. These marked points are not included in the training samples. Finally, the trained autoencoder is used to compute the outlierness of each data point in the whole data set (training samples + marked points). Experimental results with synthetic and real-world data sets show that the proposed model outperforms these widely applied techniques along with state-of-the-art works.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call