Abstract

AbstractAn outlier has a significant impact on data quality and the efficiency of data mining. The outlier identification algorithm observes only data points that do not follow clearly defined meanings of projected behaviour in a data set. Several techniques for identifying outliers have been presented in recent years, but if outliers are located in areas where neighbourhood density varies substantially, it can result in an imprecise estimate. To address this problem, we provide a ‘Relative Density‐based Outlier Factor (RDOF)’ algorithm based on the concept of mutual proximity between a data point and its neighbours. The proposed approach is divided into two stages: an influential space is created at a test point in the first stage. In the later stage, a test point is assigned an outlier‐ness score. We have conducted experiments on three real‐world data sets, namely the Johns Hopkins University Ionosphere, the Iris Plant, and Wisconsin Breast Cancer data sets. We have investigated three performance metrics for comparison: precision, recall, and rank power. In addition, we have compared our proposed method against a set of relevant baseline methods. The experimental results reveal that our proposed method detected all (i.e., 100%) outlier class objects with higher rank power than baseline approaches over these experimental data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call