Imbalanced data classification is one of the challenging problems in machine learning. Oversampling is a promising technique that generates synthetic minority instances to balance the dataset. Inappropriate minority instances generated may deteriorate the performance of the classifier. Majority of the oversampling algorithms create new minority instances by choosing nearest neighbors for random interpolation. However, these methods do not provide new information to the dataset and therefore standard classifiers do not show good performance on such datasets. Therefore, it is necessary to generate diverse minority class instances to increase the performance of the classifier. Since, every feature of each minority class instance contribute valuable information, generating synthetic instances from the features of all minority instances would produce diverse minority instances, thereby increasing the performance of the classifier. This paper proposes a Hierarchical Heterogeneous Ant Colony Optimization based oversampling algorithm using Feature Similarity (HHACO-FSOTe) for generation of synthetic minority instances. Instead of choosing few neighbors for interpolation, the proposal considers all minority instances for generation of synthetic instances. HHACO-FSOTe generates new feature values by computing the minimum absolute difference between the features of a given minority instance and the corresponding features of the remaining minority instances. The features in the dataset are distributed among the ant agents enabling parallelism, thereby reducing the time taken for oversampling. HHACO-FSOTe do not require parameter tuning or training. The proposal is evaluated on 41 low dimensional, 11 high dimensional and 8 noisy datasets. Experiments reveal that HHACO-FSOTe is competent with the state-of-art oversampling techniques. Results were validated using non-parametric statistical tests.
Read full abstract