The problem of class imbalance is pervasive across various real-world applications, resulting in machine learning classifiers exhibiting bias towards majority classes. Algorithm-level balancing approaches adapt the machine learning algorithms to learn from imbalanced datasets while preserving the data’s original distribution. The Gaussian process classifier is a powerful machine learning classification algorithm, however, as with other standard classifiers, its classification performance could be exacerbated by class imbalance. In this work, we propose the Class Imbalance Resilient Adaptive Gaussian process classifier (CIRA), an algorithm-level adaptation of the binary Gaussian process classifier to alleviate the class imbalance. To the best of our knowledge, the proposed algorithm (CIRA) is the first adaptive method for the Gaussian process classifier to handle unbalanced data. The proposed CIRA algorithm consists of two balancing modifications to the original classifier. The first modification balances the posterior mean approximation to learn a more balanced decision boundary between the majority and minority classes. The second modification adopts an asymmetric conditional prediction model to give more emphasis to the minority points during the training process. We conduct extensive experiments and statistical significance tests on forty-two real-world unbalanced datasets. Through the experiments, our proposed CIRA algorithm surpasses six popular data sampling methods with an average of 2.29%, 3.25%, 3.67%, and 1.81% in terms of the Geometric mean, F1-measure, Matthew correlation coefficient, and Area under the receiver operating characteristics curve performance metrics respectively.
Read full abstract