Abstract

With the rapid development of information technology, the number of datasets, as well as their complexity and dimension, have been growing dramatically. This dramatic growth of biology data and non-biological commercial databases becomes a challenging issue in data mining. Classification technique is one of the major tools in the captured research area. However, the performance of classification may be degraded when there exists noise in the captured databases. Therefore, outlier detection becomes an urgent need and the issue of how to integrate outlier detection method and classification techniques is an important and challenging issue. In this paper, we proposed a novel and effective approach based on k-means clustering to identify outliers in the databases. In particular, we employed one of famous classification techniques, Support Vector Machine (SVM), owing to its ability to handle highdimensional data set. We also compare the classification results with the multivariate outlier detection method. Numerical results on two different data sets indicate that the classification results after removing the outliers by our proposed method are much better than the multivariate outlier detection method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call