Abstract
Data imbalance is prevalent in the real world and has received significant attention in classification tasks, both theoretically and practically. However, imbalanced regression remains underexplored. It involves continuous labels, which is laborious and technically difficult due to the absence of distinct boundaries between different targets. To address this issue, we leverage the balanced mean square error (Balanced MSE) from a statistical perspective into the gradient boosting algorithm framework to present the imbalanced regression gradient boosting algorithm (IMr-GB). This algorithm could adapt to imbalanced prior distributions and achieve tradeoffs between frequent and rare labels, thus delivering balanced estimation and effectively reducing adverse effects on underrepresented datasets. In addition, we devise a Bayes variant of IMr-GB, denoted as IMr-bay-GB, to eliminate the complexity of the number of Gaussian mixture components and acquire robustness and optimal performance. Our strategies are extensively tested on ten real-world data sets to demonstrate their superior performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have