Abstract

Outlier detection is a challenging task especially when outliers are defined by rare combinations of multiple variables. In this paper, we develop and evaluate a new method for the detection of outliers in multivariate data that relies on Principal Components Analysis (PCA) and three-sigma limits. The proposed approach employs PCA to effectively perform dimension reduction by regenerating variables, i.e., fitted points from the original observations. The observations lying outside the three-sigma limits are identified as the outliers. This proposed method has been successfully employed to two real life and several artificially generated datasets. The performance of the proposed method is compared with some of the existing methods using different performance evaluation criteria including the percentage of correct classification, precision, recall, and F-measure. The supremacy of the proposed method is confirmed by abovementioned criteria and datasets. The F-measure for the first real life dataset is the highest, i.e., 0.6667 for the proposed method and 0.3333 and 0.4000 for the two existing approaches. Similarly, for the second real dataset, this measure is 0.8000 for the proposed approach and 0.5263 and 0.6315 for the two existing approaches. It is also observed by the simulation experiments that the performance of the proposed approach got better with increasing sample size.

Highlights

  • In most real-life datasets, there exist data observations that do not conform to general model and/or behavior of the data

  • We propose an innovative outlier detection approach based upon the PCs and three-sigma limits. e proposed approach can be employed in real time and does not require any assumption or restriction related to the dataset

  • Discussion and Conclusion is paper suggests a novel approach based upon Principal Components Analysis (PCA) and three-sigma limits for outlier detection. e predictive model is developed using the major principal components suggested by the scree plots. e main advantage of the proposed approach is that it does not require any distributional assumptions

Read more

Summary

A Novel Approach for Outlier Detection in Multivariate Data

Saima Afzal ,1 Ayesha Afzal ,2 Muhammad Amin ,3 Sehar Saleem ,4 Nouman Ali ,5 and Muhammad Sajid. We develop and evaluate a new method for the detection of outliers in multivariate data that relies on Principal Components Analysis (PCA) and three-sigma limits. E proposed approach employs PCA to effectively perform dimension reduction by regenerating variables, i.e., fitted points from the original observations. E F-measure for the first real life dataset is the highest, i.e., 0.6667 for the proposed method and 0.3333 and 0.4000 for the two existing approaches. For the second real dataset, this measure is 0.8000 for the proposed approach and 0.5263 and 0.6315 for the two existing approaches. It is observed by the simulation experiments that the performance of the proposed approach got better with increasing sample size

Introduction
Multivariate Outlier Detection Problem
Proposed Method
Numerical Evaluation
Findings
Method
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call