Abstract

Instance selection is an important problem in medical data mining. It focuses on selecting representative data samples from a given training set, whereas unrepresentative (or noisy) data samples are filtered out. This reduces the size of the training set, which then requires less storage space. In addition, when the instance selection algorithm was carefully chosen, a reduction in the training set so that it contains less noisy data can usually make the classifiers perform better than the ones without considering instance selection. In the literature, many instance selection algorithms have been proposed. However, different algorithms tend to use different criteria to determine the noisy data, making it difficult to find the best algorithm for different domain datasets. In other words, some algorithms may perform better than the others for some specific domain datasets, but may perform worse than others over other domain datasets. Instead of developing a novel algorithm that performs better than most other algorithms, this paper introduces a divide-and-conquer based instance selection (DCIS) framework that aims to improve the performance of each specific instance selection algorithm per se. Two well-known algorithms, i.e., DROP3 and IB3, are used as the baseline, and various small and large scale medical datasets are used in the experiments. Our results show that when DROP3 and IB3 are used to perform instance selection based on the DCIS framework, there is an improvement in the performance of the k-NN and SVM classifiers over the ones by the DROP3 and IB3 baselines, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call