Abstract

In recent times, dimension size has posed more challenges as compared to data size. The serious concern of high dimensional data is the curse of dimensionality and has ultimately caught the attention of data miners. Anomaly detection based on local neighborhood like local outlier factor has been admitted as state of art approach but fails when operated on the high number of dimensions for the reason mentioned above. In this paper, we determine the effects of different distance functions on an unlabeled dataset while digging outliers through the density-based approach. Further, we also explore findings regarding runtime and outlier score when dimension size and number of nearest neighbor points (min_pts) are varied. This analytic research is also very appropriate and applicable in the domain of big data and data science as well.

Highlights

  • An outlier known as anomaly could be defined as a data point that seems very dissimilar from other points based on some criteria [1,17]

  • Knowledge discovery has been utilized through outlier detection, a subfield of data mining

  • Local neighborhood-based outlier detection has been accepted as a state of art methodology while detecting outliers amongst different densities of clusters

Read more

Summary

INTRODUCTION

An outlier known as anomaly could be defined as a data point that seems very dissimilar from other points based on some criteria [1,17]. Outlier detection could be categorized in three different ways based on approaches [2,3], i.e. cluster-based, distancebased and density or local neighborhood-based. These approaches resemble each other as they operate on some notion of similarity. Even techniques based on dimensionality reduction cannot resolve this problem as feature irrelevance/relevance is determined locally. Researches solved this inherent problem by formulating methodology on subspaces (a subset of attributes) [5].

Motivation
Likeness
Accuracy of Outliers
RELATED WORK
EXPERIMENTAL WORK
LIMITATION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call