The prototype reduction (PR) methods, as an important data pre-processing task, can improve instance-based classifiers by removing noise and/or redundant samples. Recently, a series of PR methods with different heuristic strategies have been developed. Among them, clustering-based PR methods have shown competitive performance. Yet, they still suffer from the following issues: (a) most methods heavily rely on parameters; (b) most fail to remove suspicious noisy samples from the training set; (c) almost all fail to handle manifold data with nonspherical distributions effectively; (d) some have a relatively high time complexity. To advance the state of the art of clustering-based PR methods by overcoming the above issues, a novel heuristics PR method based on supervised local density peaks clustering (PRLDPC) is proposed. The main ideas of PRLDPC are concluded as follows: (a) a supervised local density peaks clustering (SLDPC) is first proposed to divide the training set into homogeneous and heterogeneous sub-clusters; (b) SLDPC-based edition method is second proposed to identify and remove noisy samples from heterogeneous sub-clusters; (c) an SLDPC-based condensing method is third proposed to obtain reduced samples from homogeneous sub-clusters and pruned heterogeneous sub-clusters. Intensive experiments have proven that (a) PRLDPC can outperform six state-of-the-art PR methods on extensive UCI and Kaggle data sets in weighing the reduction rate and classification accuracy of three instance-based classifiers; (b) PRLDPC is relatively fast and has a relatively low time complexity [Formula: see text].
Read full abstract