Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks

Xiaohui Wan,Fangyun Qin,Zheng Zheng,Xuhui Lu

doi:10.1145/3649596

Abstract

Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the difficulties of these tasks from the perspective of data complexity. In this article, we conduct an empirical study to estimate the hardness of over 33,000 instances, employing a set of measures to characterize the inherent difficulty of instances and the characteristics of defect datasets. Our findings indicate that: (1) instance hardness in both classes displays a right-skewed distribution, with the defective class exhibiting a more scattered distribution; (2) class overlap is the primary factor influencing instance hardness and can be characterized through feature, structural, and instance-level overlap; (3) no universal preprocessing technique is applicable to all datasets, and it may not consistently reduce data complexity, fortunately, dataset complexity measures can help identify suitable techniques for specific datasets; (4) integrating data complexity information into the learning process can enhance an algorithm’s learning capacity. In summary, this empirical study highlights the crucial role of data complexity in defect prediction tasks, and provides a novel perspective for advancing research in defect prediction techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Transactions on Software Engineering and Methodology	Publication Date: Jun 27, 2024
Citations: 1	License type: mit

R Discovery Prime

R Discovery Prime

Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Software Engineering and Methodology

Lead the way for us

Similar Papers

An Empirical Study of the Impact of Class Overlap on the Performance and Interpretability of Cross-Version Defect Prediction
Hui Han ... Shengyi Cheng
International Journal of Software Engineering and Knowledge Engineering | VOL. -
Hui Han, et. al.Hui Han ... Shengyi Cheng
21 Sep 2024
International Journal of Software Engineering and Knowledge Engineering | VOL. -

Task complexity and difficulty in music information retrieval
Xiao Hu ... Noriko Kando
Journal of the Association for Information Science and Technology | VOL. 68
Xiao Hu, et. al.Xiao Hu ... Noriko Kando
30 May 2017
Journal of the Association for Information Science and Technology | VOL. 68

Object-Oriented Metrics for Defect Prediction
Satwinder Singh ... Rozy Singla
-
Satwinder Singh, et. al.Satwinder Singh ... Rozy Singla
13 Jun 2018
13 Jun 2018

Defect prediction for Cascading Style Sheets
M Serdar Biçer ... Banu Diri
Applied Soft Computing | VOL. 49
M Serdar Biçer, et. al.M Serdar Biçer ... Banu Diri
30 May 2016
Applied Soft Computing | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Software Engineering and Methodology