A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Mujeeb Ur Rehman,Dost Muhammad Khan

doi:10.5755/j01.itc.50.1.25588

Abstract

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Highlights

An outlier could be differentiated from an inlier in such a way that it could be considered a very different observation that might demonstrate very beneficial for some individual or organization
Local neighborhood-based anomaly detection reveals that regular data points occupy the condensed neighborhood, from the other perspective, anomalies are far away from their neighbors, that is., these irregular points inhabit the less condensed neighborhood
Advancement in computer technology has motivated researchers to shift their focus from low dimensional data to high dimensional data

Summary

Introduction

An outlier could be differentiated from an inlier in such a way that it could be considered a very different observation that might demonstrate very beneficial for some individual or organization. LSOF proves very efficient method while detecting outliers from high dimensional data as it reduces variance among neighboring data points [2]. This problem needs to be engaged in so that anomalies could be made distinguishable. Normal distance metric methods fail to distinguish between outliers and inliers as all observations seem distant from one another and it happens because of the inherent feature of high dimensional data. Since traditional techniques utilize normal distance measuring methods, these fail altogether to detect anomalies present in high dimensional data.

Research Motivation

Problem Statement

Related Work

Results get influenced due to randomness

ProblemcSoSumtbaspptueatcemedearnet

Finding Outlier Degree of each Data Point

Comparison of Outlierness with Different Perspectives

Experimental Work

Limitation

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Technology and Control	Publication Date: Mar 25, 2021
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information Technology and Control

Lead the way for us

Similar Papers

Subspace Clustering of High Dimensional Spatial Data with Noises
Chih-Ming Hsu ... Ming-Syan Chen
-
Chih-Ming Hsu, et. al.Chih-Ming Hsu ... Ming-Syan Chen
01 Jan 2004
01 Jan 2004

Outlier detection toward high-dimensional industrial data using extreme tensor-train learning machine with compression
Xiaowu Deng ... Dunhong Yao
Journal of King Saud University - Computer and Information Sciences | VOL. 35
Xiaowu Deng, et. al.Xiaowu Deng ... Dunhong Yao
24 May 2023
Journal of King Saud University - Computer and Information Sciences | VOL. 35

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data
Tahir Mehmood ... Zahid Rasheed
Communications for Statistical Applications and Methods | VOL. 22
Tahir Mehmood, et. al.Tahir Mehmood ... Zahid Rasheed
30 Nov 2015
Communications for Statistical Applications and Methods | VOL. 22

Projected outlier detection in high-dimensional mixed-attributes data set
Mao Ye ... Maria E Orlowska
Expert systems with applications | VOL. 36
Mao Ye, et. al.Mao Ye ... Maria E Orlowska
14 Aug 2008
Expert systems with applications | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information Technology and Control