Abstract

Estimating thresholds for software metrics is a key step towards assigning a quality index. In defect prediction, two approaches are widely used those based on statistics and, that which uses rigorous mathematical models. Although significant insights have been surmised, a general consensus on their results is still far from generalizations. In these perspectives, we attempt to check whether there exists any relationship between the two approaches. An empirical investigation is carried out in this work to study the relationship between estimated threshold values calculated at various risk levels using Bender’s approach and measures of central tendency using the Apache Click web application. The effect of these different threshold estimates on the performance of the developed defect prediction models is also studied and validated using different releases of the dataset. We find that the threshold indicator obtained from the representational models such as that due to Bender has an intricate relationship with the median value of the dataset. The close association between the model and statistical parameters mainly stems from the underlying characteristics of the data set itself. Descriptive statistical analysis of all Apache Click metrics dataset is found to be positively skewed, and hence median render the most relevant central measure for threshold estimation. Additionally, we also find that with increasing risk level, the threshold value subsequently shifts from median to mean value of the underlying metric data. Our preposition that the performance of the defect prediction model is best when threshold estimates are closer to the median is also verified with inter-version project comparison.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.