Estimating the threshold of software metrics for web applications

Ruchika Malhotra,Anjali Sharma

doi:10.1007/s13198-019-00773-1

Abstract

Estimating thresholds for software metrics is a key step towards assigning a quality index. In defect prediction, two approaches are widely used those based on statistics and, that which uses rigorous mathematical models. Although significant insights have been surmised, a general consensus on their results is still far from generalizations. In these perspectives, we attempt to check whether there exists any relationship between the two approaches. An empirical investigation is carried out in this work to study the relationship between estimated threshold values calculated at various risk levels using Bender’s approach and measures of central tendency using the Apache Click web application. The effect of these different threshold estimates on the performance of the developed defect prediction models is also studied and validated using different releases of the dataset. We find that the threshold indicator obtained from the representational models such as that due to Bender has an intricate relationship with the median value of the dataset. The close association between the model and statistical parameters mainly stems from the underlying characteristics of the data set itself. Descriptive statistical analysis of all Apache Click metrics dataset is found to be positively skewed, and hence median render the most relevant central measure for threshold estimation. Additionally, we also find that with increasing risk level, the threshold value subsequently shifts from median to mean value of the underlying metric data. Our preposition that the performance of the defect prediction model is best when threshold estimates are closer to the median is also verified with inter-version project comparison.

Full Text