Abstract
Imbalanced tabular datasets adversely impact the predictive performance of most supervised learning algorithms as the imbalanced distribution can lead to a bias preferring the majority class. To address this problem, we propose utilizing supervised contrastive representation learning in conjunction with the tree-structured parzen estimator technique for imbalanced tabular data. Drawing on the success of contrastive representation learning in computer vision, we extend its application to the tabular domain. Through the introduction of supervised contrastive learning, we address the limitation of data augmentation methods for tabular data by incorporating label information. This approach enables us to extract hidden information from the tabular data and obtain discriminative representations, which enhances the performance of supervised learning algorithms. Additionally, the hyper-parameter temperature τ of supervised contrastive learning has a decisive influence on the performance and is difficult to tune. We introduce tree-structured parzen estimator, a Bayesian optimization technique, to automatically select the best τ. We evaluate our approach on fifteen real-world public tabular datasets from diverse domains. The results reveal the superiority of tree-structured parzen estimator over other hyper-parameter optimization methods in effectively searching for the optimal value of τ. More importantly, the proposed method outperforms baseline approaches for imbalanced learning, achieving average improvements of 5.1%, 6.0%, 9.0%, and 8.7% across four main evaluation metrics, which validates that the proposed method is well-suited for addressing imbalanced problems in real-world applications.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have