Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling

Gilles Vandewiele,Isabelle Dehaene,György Kovács,Lucas Sterckx,Olivier Janssens,Femke Ongenae,Femke De Backere,Filip De Turck,Kristien Roelens,Johan Decruyenaere,Sofie Van Hoecke,Thomas Demeester

doi:10.1016/j.artmed.2020.101987

Gilles Vandewiele, Isabelle Dehaene + Show 10 more

Open Access

https://doi.org/10.1016/j.artmed.2020.101987

Copy DOI

Abstract

Information extracted from electrohysterography recordings could potentially prove to be an interesting additional source of information to estimate the risk on preterm birth. Recently, a large number of studies have reported near-perfect results to distinguish between recordings of patients that will deliver term or preterm using a public resource, called the Term/Preterm Electrohysterogram database. However, we argue that these results are overly optimistic due to a methodological flaw being made. In this work, we focus on one specific type of methodological flaw: applying over-sampling before partitioning the data into mutually exclusive training and testing sets. We show how this causes the results to be biased using two artificial datasets and reproduce results of studies in which this flaw was identified. Moreover, we evaluate the actual impact of over-sampling on predictive performance, when applied prior to data partitioning, using the same methodologies of related studies, to provide a realistic view of these methodologies’ generalization capabilities. We make our research reproducible by providing all the code under an open license.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence in Medicine

Lead the way for us

Journal: Artificial Intelligence in Medicine	Publication Date: Nov 20, 2020
Citations: 55

Similar Papers

Splitting chemical structure data sets for federated privacy-preserving machine learning
Jaak Simm ... Lina Humbeck
Journal of Cheminformatics | VOL. 13
Jaak Simm, et. al.Jaak Simm ... Lina Humbeck
01 Dec 2021
Journal of Cheminformatics | VOL. 13

Flaws in evaluation schemes for pair-input computational predictions
Yungki Park ... Edward M Marcotte
Nature Methods | VOL. 9
Yungki Park, et. al.Yungki Park ... Edward M Marcotte
01 Dec 2012
Nature Methods | VOL. 9

A deep learning approach for imbalanced crash data in predicting highway-rail grade crossings accidents
Lu Gao ... Yihao Ren
Reliability Engineering & System Safety | VOL. 216
Lu Gao, et. al.Lu Gao ... Yihao Ren
31 Aug 2021
Reliability Engineering & System Safety | VOL. 216

Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers
Sebastián Maldonado ... Claudio Montecinos
Intelligent Data Analysis | VOL. 18
Sebastián Maldonado, et. al.Sebastián Maldonado ... Claudio Montecinos
01 Jan 2014
Intelligent Data Analysis | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence in Medicine