Abstract

Classification noise is a common byproduct of traditional data mining approaches, and no specialized approach for detecting classification noise is currently available. Methods for outlier detection are well-developed, but outliers and classification noise have characteristics different enough to make outlier detection algorithms unsuitable for classification noise detection. In this paper, a new, specialized approach to detect classification noise is proposed, named relative density based classification noise detection (RDBCND). Computational experiments in artificial data sets described herein show that RDBCND has time complexity of O(nlogn), indicating greater efficiency than traditional approaches, which exhibit time complexity of at least O(n2). The use of classification noise detection to improve the generalization ability of common classifier algorithms is also described. In particular, a new unified approach based on RDBCND is compared to a cross validation approach applied to a BP neural network. Trials in both artificial and real-life datasets show that the RDBCND-based approach can greatly accelerate the process of identifying the best decision function. The novel method can also eliminate underfitting, as the algorithm simply searches for the highest training accuracy. The experiments also show that the RDBCND-based method has greater accuracy and lower cpu time in reaching global solutions than the cross-validation method. Since the relative density is a local concept, our new approach can be directly used in nonlinear datasets without data transformation. It is a great advantage compared to some linear classifier algorithms. As in current linear classifiers, the kernel functions or other transformations need to be used to make them suitable for non-linear datasets, and that will increase their complexity.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.