The distance-based instance selection algorithm for the feed-forward neural network is a numerosity reduction technique. It selects only the instances at the decision boundary between consecutive classes of data to lessen the number of instances in the original training set using the Euclidean distance function. This paper studies improvement of the reduction performance and the classification performance of the nine distance functions in the distance-based instance selection algorithm. The evaluation was conducted on the real-world data sets from the UCI machine learning repository and ELENA project. The data reduction performance results confirmed that the Chebyshev, Cosine, and Minkowski distance functions are recommended for the integer data type. The Minkowski distance function is recommended for the categorical data type. The Jaccard distance function is recommended for the real data type. The selection of the initial distance function based on the data type can make the distance-based instance selection algorithm produce the best classification performance.
Read full abstract