Abstract
The knowledge about the software metrics which serve as defect indicators is vital for the efficient allocation of resources for quality assurance. It is the process metrics, although sometimes difficult to collect, which have recently become popular with regard to defect prediction. However, in order to identify rightly the process metrics which are actually worth collecting, we need the evidence validating their ability to improve the product metric-based defect prediction models. This paper presents an empirical evaluation in which several process metrics were investigated in order to identify the ones which significantly improve the defect prediction models based on product metrics. Data from a wide range of software projects (both, industrial and open source) were collected. The predictions of the models that use only product metrics (simple models) were compared with the predictions of the models which used product metrics, as well as one of the process metrics under scrutiny (advanced models). To decide whether the improvements were significant or not, statistical tests were performed and effect sizes were calculated. The advanced defect prediction models trained on a data set containing product metrics and additionally Number of Distinct Committers (NDC) were significantly better than the simple models without NDC, while the effect size was medium and the probability of superiority (PS) of the advanced models over simple ones was high ( $$p=.016$$ p = . 016 , $$r=-.29$$ r = - . 29 , $$\hbox {PS}=.76$$ PS = . 76 ), which is a substantial finding useful in defect prediction. A similar result with slightly smaller PS was achieved by the advanced models trained on a data set containing product metrics and additionally all of the investigated process metrics ( $$p=.038$$ p = . 038 , $$r=-.29$$ r = - . 29 , $$\hbox {PS}=.68$$ PS = . 68 ). The advanced models trained on a data set containing product metrics and additionally Number of Modified Lines (NML) were significantly better than the simple models without NML, but the effect size was small ( $$p=.038$$ p = . 038 , $$r=.06$$ r = . 06 ). Hence, it is reasonable to recommend the NDC process metric in building the defect prediction models.
Highlights
Software development companies are seeking for ways to improve the quality of software systems without allocating too many resources in the quality assurance activities such as testing
This paper presents an empirical evaluation in which several process metrics were investigated in order to identify the ones which significantly improve the defect prediction models based on product metrics
When the advanced models turned out to be significantly better than the simple ones, we calculated the effect size in order to assess whether the investigated process metric may be useful in practice of software defect prediction
Summary
Software development companies are seeking for ways to improve the quality of software systems without allocating too many resources in the quality assurance activities such as testing. Applying the same testing effort to all modules of a software system is not an optimal approach, since the distribution of defects among individual parts of a system is not uniform. It is possible to test only a small part of a software system and find most of the defects. In turn, may be used to find the defect-prone classes. The quality assurance efforts should be focused (unless for critical projects) on the most defect-prone classes in order to save valuable time and financial resources, and, at the same time, to increase the quality of delivered software products
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.