Abstract

With the rapid development of information technology, the number of datasets, as well as their complexity and dimension, have been growing dramatically. This dramatic growth of biology data and non-biological commercial databases becomes a challenging issue in data mining. Classification technique is one of the major tools in the captured research area. However, the performance of classification may be degraded when there exists noise in the captured databases. Therefore, outlier detection becomes an urgent need and the issue of how to integrate outlier detection method and classification techniques is an important and challenging issue. In this paper, we proposed a novel and effective approach based on k-means clustering to identify outliers in the databases. In particular, we employed one of famous classification techniques, Support Vector Machine (SVM), owing to its ability to handle highdimensional data set. We also compare the classification results with the multivariate outlier detection method. Numerical results on two different data sets indicate that the classification results after removing the outliers by our proposed method are much better than the multivariate outlier detection method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.