Abstract

In this paper, we propose an outlier detection approach based on local kernel regression for instance selection. It evaluates the reconstruction error of instances by their neighbors to identify the outliers. Experiments are performed on the synthetic and real data sets to show the efficacy of the proposed approach in comparison with the existing counterparts.

Highlights

  • Instance selection, which is to clean the noisecorrupted and redundant data, has been widely used in real world applications such as marketing research 1 and data mining 2

  • We compared the proposed algorithm with the angle-based outlier detection (ABOD) and local outlier factor (LOF) approaches on this 2-D data set

  • The shape image is represented by inner distance shape context (IDSC) feature 25, which provides good shape descriptors for binary image shape

Read more

Summary

Introduction

Instance selection, which is to clean the noisecorrupted and redundant data, has been widely used in real world applications such as marketing research 1 and data mining 2. Hautamaki et al.[11] constructed the k-NN graph for a data set, in which a vertex that has an indegree less than the user-defined threshold is an outlier In deviationbased approaches, it groups points and considers those points as outliers that deviate considerably from the general characteristics of the groups. These algorithms cannot properly detect the outliers under the noisy environment unless the number of clusters is known in advance Along this line, He et al.[16] proposed FindCBLOF to determine the Cluster-Based Local Outlier Factor (CBLOF) for each data point. We shall present an iterative method based on local kernel ridge regression to detect the outliers inward starting from the utmost outlier which has the greatest reconstruction error.

Overview of Kernel Ridge Regression
A New Algorithm for Outlier Detection using Local Kernel Regression
Experimental Results
Synthetic data with several noisy points
Synthetic data with many noisy points
Breast Cancer Wisconsin Data
MPEG-7 Shape Data
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.