The effect of data complexity on classifier performance

Jonas Eberlein,Daniel Rodriguez,Rachel Harrison

doi:10.1007/s10664-024-10554-5

Abstract

The research area of Software Defect Prediction (SDP) is both extensive and popular, and is often treated as a classification problem. Improvements in classification, pre-processing and tuning techniques, (together with many factors which can influence model performance) have encouraged this trend. However, no matter the effort in these areas, it seems that there is a ceiling in the performance of the classification models used in SDP. In this paper, the issue of classifier performance is analysed from the perspective of data complexity. Specifically, data complexity metrics are calculated using the Unified Bug Dataset, a collection of well-known SDP datasets, and then checked for correlation with the defect prediction performance of machine learning classifiers (in particular, the classifiers C5.0, Naive Bayes, Artificial Neural Networks, Random Forests, and Support Vector Machines). In this work, different domains of competence and incompetence are identified for the classifiers. Similarities and differences between the classifiers and the performance metrics are found and the Unified Bug Dataset is analysed from the perspective of data complexity. We found that certain classifiers work best in certain situations and that all data complexity metrics can be problematic, although certain classifiers did excel in some situations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The effect of data complexity on classifier performance

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering

Lead the way for us

Journal: Empirical Software Engineering	Publication Date: Oct 31, 2024
License type: CC BY 4.0

Similar Papers

Pre-harvest classification of crop types using a Sentinel-2 time-series and machine learning
Mmamokoma Grace Maponya ... Zama Eric Mashimbye
Computers and Electronics in Agriculture | VOL. 169
Mmamokoma Grace Maponya, et. al.Mmamokoma Grace Maponya ... Zama Eric Mashimbye
15 Jan 2020
Computers and Electronics in Agriculture | VOL. 169

Heart Failure prediction on diversified datasets to improve generalizability using 2-Level Stacking
Madhuri Dubey ... Richa Makhijani
Multidisciplinary Science Journal | VOL. 6
Madhuri Dubey, et. al.Madhuri Dubey ... Richa Makhijani
31 Aug 2023
Multidisciplinary Science Journal | VOL. 6

Interpretable Software Defect Prediction from Project Effort and Static Code Metrics
Susmita Haldar ... Luiz Fernando Capretz
Computers | VOL. 13
Susmita Haldar, et. al.Susmita Haldar ... Luiz Fernando Capretz
16 Feb 2024
Computers | VOL. 13

Domains of Competence of Artificial Neural Networks Using Measures of Separability of Classes
Julián Luengo ... Francisco Herrera
-
Julián Luengo, et. al.Julián Luengo ... Francisco Herrera
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The effect of data complexity on classifier performance

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering