Weighted software metrics aggregation and its application to defect prediction

Maria Ulan,Welf Löwe,Anna Wingkvist,Morgan Ericsson

doi:10.1007/s10664-021-09984-2

Maria Ulan, Welf Löwe + Show 2 more

Open Access

https://doi.org/10.1007/s10664-021-09984-2

Copy DOI

Journal: Empirical Software Engineering	Publication Date: Jun 23, 2021
Citations: 6	License type: open-access

Affiliation: Linnaeus University

Abstract

It is a well-known practice in software engineering to aggregate software metrics to assess software artifacts for various purposes, such as their maintainability or their proneness to contain bugs. For different purposes, different metrics might be relevant. However, weighting these software metrics according to their contribution to the respective purpose is a challenging task. Manual approaches based on experts do not scale with the number of metrics. Also, experts get confused if the metrics are not independent, which is rarely the case. Automated approaches based on supervised learning require reliable and generalizable training data, a ground truth, which is rarely available. We propose an automated approach to weighted metrics aggregation that is based on unsupervised learning. It sets metrics scores and their weights based on probability theory and aggregates them. To evaluate the effectiveness, we conducted two empirical studies on defect prediction, one on ca. 200 000 code changes, and another ca. 5 000 software classes. The results show that our approach can be used as an agnostic unsupervised predictor in the absence of a ground truth.

Highlights

Quality assurance is usually done with a limited budget
Details about the Chidamber and Kemerer (CK)+OO metrics, their churn and entropy, and the variants thereof can be found in the original paper by D’Ambros et al (2010). They found out that the model variant based on weighted churn achieved the best performance among of models build from the churn of CK+OO, and that the model based on linearly decayed entropy achieved the best performance among of models build from entropy of these metrics
We conclude that our unsupervised model can improve the performance of the simplesupervised models based on code change metrics, i.e., applying our approach for aggregation of the whole set of metrics performs better than models built on a single well-chosen metric

Summary

Introduction

Quality assurance is usually done with a limited budget. its activities must be performed as efficiently as possible. Understanding which software artifacts are likely to be problematic helps to prioritize activities and to allocate resources . To gain this knowledge, software quality assessment employs weighted software metrics. We could understand defect prediction as an instance of multi-criteria decision making where we decide, which software artifact is prone to contain defects. In both cases, defect prediction inherits problems connected to the dependencies between criteria or metrics and connected to the subjective setting of weights, as detailed in Sections 2.1 and 2.2, respectively.

Objectives

Results

Discussion

Conclusion