TVOR: Finding Discrete Total Variation Outliers Among Histograms

Nikola Banic,Neven Elezovic

doi:10.1109/access.2020.3047342

Abstract

Pearson’s chi-squared test can detect outliers in the data distribution of a given set of histograms. However, in fields such as demographics (for e.g. birth years), outliers may be more easily found in terms of the histogram smoothness where techniques such as Whipple’s or Myers’ indices handle successfully only specific anomalies. This paper proposes smoothness outliers detection among histograms by using the relation between their discrete total variations (DTV) and their respective sample sizes. This relation is mathematically derived to be applicable in all cases and simplified by an accurate linear model. The deviation of the histogram’s DTV from the value predicted by the model is used as the outlier score and the proposed method is named Total Variation Outlier Recognizer (TVOR). TVOR requires no prior assumptions about the histograms’ samples’ distribution, it has no hyperparameters that require tuning, it is not limited to only specific patterns, and it is applicable to histograms with the same bins. Each bin can have an arbitrary interval that can also be unbounded. TVOR finds DTV outliers easier than Pearson’s chi-squared test. In case of distribution outliers, the opposite holds. TVOR is tested on real census data and it successfully finds suspicious histograms. The source code is given at https://github.com/DiscreteTotalVariation/TVOR .

Highlights

Outliers can be defined as data patterns that do not conform to an expected normal data behavior [1]
3) RESULTS The first experiments that were carried out consisted of taking many variously sized subsamples of the birth years from the German census of 1939, calculating the discrete total variations of their birth year histograms, and fitting the proposed method’s model in Eq (49) to the data obtained in this way
WORK In this paper, a method for finding discrete total variation outliers among histograms has been proposed. It scores histograms based on the deviation of their discrete total variation from its expected value

Summary

Introduction

Outliers can be defined as data patterns that do not conform to an expected normal data behavior [1]. Since identifying outliers or anomalies can often be useful, performing outlier, i.e. anomaly, detection has an important role in many data related areas. With the ever growing application of machine learning in various fields, having clean training sets, free of any unwanted outliers, can often significantly benefit the final production accuracy. In real-time applications such as network traffic or health monitoring, it is usually highly important to detect anomalies that could represent any form of unwanted behavior to prevent their potentially detrimental effects. It may be required to see which samples differ the most from the rest of the data and study them in more detail. Since there is a relatively high demand for anomaly and outlier detection methods in fields dealing with some form

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Dec 25, 2020
Citations: 30	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

TVOR: Finding Discrete Total Variation Outliers Among Histograms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Image Denoising with a Constrained Discrete Total Variation Scale Space
Igor Ciril ... Jérôme Darbon
-
Igor Ciril, et. al.Igor Ciril ... Jérôme Darbon
01 Jan 2010
01 Jan 2010

Sparse Modeling-Based Sequential Ensemble Learning for Effective Outlier Detection in High-Dimensional Numeric Data
Guansong Pang ... Longbing Cao
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 32
Guansong Pang, et. al.Guansong Pang ... Longbing Cao
29 Apr 2018
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 32

New discretization of total variation functional for image processing tasks
Alireza Hosseini
Signal Processing: Image Communication | VOL. 78
Alireza HosseiniAlireza Hosseini
18 Jun 2019
Signal Processing: Image Communication | VOL. 78

An explainable outlier detection method using region-partition trees
Cheong Hee Park ... Jiil Kim
The Journal of Supercomputing | VOL. 77
Cheong Hee Park, et. al.Cheong Hee Park ... Jiil Kim
20 Jul 2020
The Journal of Supercomputing | VOL. 77

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TVOR: Finding Discrete Total Variation Outliers Among Histograms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access