Abstract

The efficiency of the k-Nearest Neighbour classifier depends on the size of the training set as well as the level of noise in it. Large datasets with high level of noise lead to less accurate classifiers with high computational cost and storage requirements. The goal of editing is to improve accuracy by improving the quality of the training datasets. To obtain such datasets, editing removes noise and mislabeled data as well as smooths the decision boundaries between the discrete classes. On the other hand, prototype abstraction aims to reduce the computational cost and the storage requirements of classifiers by condensing the training data. This paper proposes an editing algorithm called Editing through Homogeneous Clusters (EHC). Then, it extends the idea by introducing a prototype abstraction algorithm that integrate the EHC mechanism and is capable of creating a small noise-free representative set of the initial training data. This algorithm is called Editing and Reduction through Homogeneous Clusters (ERHC). Both are based on a fast and parameter free iterative execution of k-means clustering that forms homogeneous clusters. Both consider as noise and remove clusters consisting of a single item. In addition, ERHC summarizes the items of the remaining clusters by storing the mean item for each one in the representative set. EHC and ERHC are tested on several datasets. The results show that both run very fast and achieve high accuracy. In addition, ERHC achieves high reduction rates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.