Density-based weighting for imbalanced regression

Michael Steininger,Andreas Hotho,Konstantin Kobs,Anna Krause,Padraig Davidson

doi:10.1007/s10994-021-06023-5

Abstract

In many real world settings, imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. In this work, we propose a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data called DenseLoss based on our weighting scheme. DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). DenseLoss adjusts each data point’s influence on the loss according to DenseWeight, giving rare data points more influence on model training compared to common data points. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points.

Highlights

Many machine learning algorithms, like neural networks, typically expect roughly uniform target distributions (Cui et al 2019; Krawczyk 2016; Sun et al 2009)
Our contributions are as follows: (i) We propose DenseWeight, a sample weighting approach for regression with imbalanced data. (ii) We propose DenseLoss, a costsensitive learning approach based on DenseWeight for neural network regression models with imbalanced data. (iii) We analyze DenseLoss ’s influence on performance for common and rare data points using synthetic data. (iv) We compare DenseLoss to the state-of-the-art imbalanced regression method SMOGN, finding that our method typically provides better performance. (v) We apply DenseLoss to the heavily imbalanced
The results show for the rarest bins that DenseLoss provides the best performance for 8 datasets while SMOGN only performs best on 3 datasets and applying no method is best for only 2 datasets

Summary

Introduction

Like neural networks, typically expect roughly uniform target distributions (Cui et al 2019; Krawczyk 2016; Sun et al 2009). For regression there should be a similar density of samples across the complete target value range. Many datasets exhibit skewed target distributions with target values in certain ranges occurring less frequently than others. Models can become biased, leading to better performance for common cases than for rare cases (Cui et al 2019; Krawczyk 2016). This is problematic for tasks where these rare occurrences are of special interest. Examples include precipitation estimation, where extreme rainfall is rare but can have dramatic consequences, or fraud detection, where rare fraudulent events are supposed to be detected

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Learning	Publication Date: Jul 7, 2021
Citations: 66	License type: open-access

R Discovery Prime

Density-based weighting for imbalanced regression

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Similar Papers

Modeling using support vector machines on imbalanced data: A case study on the prediction of the sightings of Irrawaddy dolphins
Liew Chin Ying ... Wang Yin Chai
-
Liew Chin Ying, et. al.Liew Chin Ying ... Wang Yin Chai
01 Jan 2015
01 Jan 2015

Assessing the area of applicability of spatial prediction models through a local data point density approach
Fabian Schumacher ... Marvin Ludwig
-
Fabian Schumacher, et. al.Fabian Schumacher ... Marvin Ludwig
08 Mar 2024
08 Mar 2024

An Efficient Reference-Point Based k Neighbors Algorithm for Imbalanced Data
Junkuan Wang ... Zizhong Chen
-
Junkuan Wang, et. al.Junkuan Wang ... Zizhong Chen
22 Apr 2022
22 Apr 2022

Data Augmentation by Guided Deep Interpolation
Gergely Szlobodnyik ... Lóránt Farkas
Applied Soft Computing | VOL. 111
Gergely Szlobodnyik, et. al.Gergely Szlobodnyik ... Lóránt Farkas
13 Jul 2021
Applied Soft Computing | VOL. 111

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Density-based weighting for imbalanced regression

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Machine Learning