Abstract

Outlier detection is a technique to identify and remove significantly different data from the more correct and consistent data in a data set. Outlier data can have negative impact on classification and clustering performance; that should be identified and removed to improve the classification efficiency. Regardless of whether a classifying technique classifies an outlier correctly, the very notion of identifying a data as outlier is of great significance. In this paper, a new approach is proposed for outlier data detection within a test data set along with unsupervised training set selection. The selected training set is used for two-step classification. After unsupervised clustering the training set, the closest cluster to a test sample is selected using the Euclidean distance measure. Then, the outlier in the test sample is identified with the concepts of standard deviation and mean value. The results showed by evaluating the distance of each sample of the test set with the new selected data set. the accuracy of the classifiers is enhanced after detection and elimination of outlier data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.