Imbalanced datasets pose challenges to standard classification algorithms. Although oversampling techniques can balance the number of samples across different classes, the difficulties of imbalanced classification is not solely imbalanced data itself but other factors, such as small disjuncts and overlapping regions, especially in the presence of noise. Traditional oversampling techniques are not effectively address these intricacies. To this end, we propose a novel oversampling method called Newton’s Cooling Law-Based Weighted Oversampling (NCLWO). The proposed method initially calculates the weight of the minority class based on density and closeness factors to identify hard-to-learn samples, assigning them higher heat. Subsequently, Newton’s Cooling Law is applied to each minority class sample by using it as the center and expanding the sampling region outward, gradually decreasing the heat until reaching a balanced state. Finally, majority class samples within the sampling region are translated to eliminate overlapping areas, and a weighted oversampling approach is employed to synthesize informative minority class samples. The experimental study, carried out on a set of benchmark datasets, confirm that the proposed method not only outperforms state-of-the-art oversampling approaches but also shows greater robustness in the presence of feature noise.
Read full abstract