Overcoming the ordinal imbalanced data problem by combining data processing and stacked generalizations

Marine Desprez,Kyle Zawada,Daniel Ramp

doi:10.1016/j.mlwa.2021.100241

Marine Desprez, Kyle Zawada + Show 1 more

Open Access

https://doi.org/10.1016/j.mlwa.2021.100241

Copy DOI

Abstract

Ordinal imbalanced datasets are pervasive in real world applications but remain challenging to analyse as they require specific methods to account for the ordering information and imbalanced classes. Failure to account for both those characteristics can substantially impact the model predictive performance. However, existing methods tend to focus either on ordinality or imbalance, rather than addressing both simultaneously. The few approaches that do account for both characteristics are not always easy to implement for non-advanced analysts and simpler approaches are needed to facilitate appropriate data processing. Here, we developed a general approach using some of the most popular machine learning algorithms to ensure appropriate processing of ordinal imbalanced datasets and to optimize the predictions of all classes. After transforming the multi-class ordinal problem into a well-known binary problem, we implemented several different resampling methods in a decision-tree classifier. We then used a stacked generalization algorithm to combine the classifiers to improve model predictive performance. To test our approach, we used two ordinal imbalanced datasets on student performance and wine quality. Individual resampling techniques tended to improve the accuracy of minority classes, while simultaneously increasing the number of false positives in those classes. This resulted in a decrease, sometimes substantial, in accuracy of other classes. The stacking model offered a good compromise between improvement in accuracy of minority classes and mitigation of reduced accuracy in other classes. Our approach provided useful insights into modelling strategies that should be favoured for implementation in production that involve these common datasets, depending on the end-user interests.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Learning with Applications	Publication Date: Dec 21, 2021
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Overcoming the ordinal imbalanced data problem by combining data processing and stacked generalizations

Abstract

Talk to us

Similar Papers

More From: Machine Learning with Applications

Lead the way for us

Similar Papers

A new re-sampling method for network traffic classification using SML
Wang Ruoyu ... Liu Zhen
-
Wang Ruoyu, et. al. Wang Ruoyu ... Liu Zhen
01 Dec 2010
01 Dec 2010

A Classification Algorithm Based on Ensemble Feature Selections for Imbalanced-Class Dataset
Hua Yin ... Keke Gai
-
Hua Yin, et. al.Hua Yin ... Keke Gai
01 Apr 2016
01 Apr 2016

Cost-sensitive multi-layer perceptron for binary classification with imbalanced data
Zheng Liu ... Wendong Xiao
-
Zheng Liu, et. al.Zheng Liu ... Wendong Xiao
01 Jul 2018
01 Jul 2018

Evolutionary-Based Ensemble Under-Sampling for Imbalanced Data
Yongqing Zhang ... Rongzhao Lu
-
Yongqing Zhang, et. al.Yongqing Zhang ... Rongzhao Lu
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Overcoming the ordinal imbalanced data problem by combining data processing and stacked generalizations

Abstract

Talk to us

Similar Papers

More From: Machine Learning with Applications