Abstract

Ordinal imbalanced datasets are pervasive in real world applications but remain challenging to analyse as they require specific methods to account for the ordering information and imbalanced classes. Failure to account for both those characteristics can substantially impact the model predictive performance. However, existing methods tend to focus either on ordinality or imbalance, rather than addressing both simultaneously. The few approaches that do account for both characteristics are not always easy to implement for non-advanced analysts and simpler approaches are needed to facilitate appropriate data processing. Here, we developed a general approach using some of the most popular machine learning algorithms to ensure appropriate processing of ordinal imbalanced datasets and to optimize the predictions of all classes. After transforming the multi-class ordinal problem into a well-known binary problem, we implemented several different resampling methods in a decision-tree classifier. We then used a stacked generalization algorithm to combine the classifiers to improve model predictive performance. To test our approach, we used two ordinal imbalanced datasets on student performance and wine quality. Individual resampling techniques tended to improve the accuracy of minority classes, while simultaneously increasing the number of false positives in those classes. This resulted in a decrease, sometimes substantial, in accuracy of other classes. The stacking model offered a good compromise between improvement in accuracy of minority classes and mitigation of reduced accuracy in other classes. Our approach provided useful insights into modelling strategies that should be favoured for implementation in production that involve these common datasets, depending on the end-user interests.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.