A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection

Markus Goldstein,Seiichi Uchida

doi:10.5220/0005701302630269

Markus Goldstein, Seiichi Uchida

Open Access

https://doi.org/10.5220/0005701302630269

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2016
Citations: 2	License type: cc-by-nc-nd

Affiliation: Kyushu University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of âunexpectedâ outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset. For accurate performance observation, we need to use a simple and describable recognition procedure and thus utilize a nearest neighbor-based classifier. As an adequate large-scale dataset, we prepared a handwritten digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of 16×16 = 256, it is ensured that each digit class has at least 100 times more instances than data dimensionality. The experimental results show that the common understanding that outlier removal improves classification performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found that local anomaly detection algorithms perform better on this data than their global equivalents.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection

Abstract

Published Version

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An Enhanced Smartphone Indoor Positioning Scheme with Outlier Removal Using Machine Learning
Zhenbing Zhang ... Xiaodong Gong
Remote Sensing | VOL. 13
Zhenbing Zhang, et. al.Zhenbing Zhang ... Xiaodong Gong
14 Mar 2021
Remote Sensing | VOL. 13

Research on unsupervised anomaly data detection method based on improved automatic encoder and Gaussian mixture model
Xiangyu Liu ... Fan Yang
Journal of Cloud Computing | VOL. 11
Xiangyu Liu, et. al.Xiangyu Liu ... Fan Yang
29 Sep 2022
Journal of Cloud Computing | VOL. 11

Learning From Mislabeled Training Data Through Ambiguous Learning for In-Home Health Monitoring
Weiwei Yuan ... Guangjie Han
IEEE Journal on Selected Areas in Communications | VOL. 39
Weiwei Yuan, et. al.Weiwei Yuan ... Guangjie Han
04 Sep 2020
IEEE Journal on Selected Areas in Communications | VOL. 39

Application of an Anomaly Detection Model to Screen for Ocular Diseases Using Color Retinal Fundus Images: Design and Evaluation Study.
Yong Han ... Mengmeng Liu
Journal of Medical Internet Research | VOL. 23
Yong Han, et. al.Yong Han ... Mengmeng Liu
13 Jul 2021
Journal of Medical Internet Research | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection

Abstract

Published Version

Talk to us

Similar Papers