Fault Prediction with Static Software Metrics in Evolving Software: A Case Study in Apache Ant

Xue Han,Gongjun Yan

doi:10.4236/jcc.2022.102003

Abstract

Software testing is an integral part of software development. Not only that testing exists in each software iteration cycle, but it also consumes a considerable amount of resources. While resources such as machinery and manpower are often restricted, it is crucial to decide where and how much effort to put into testing. One way to address this problem is to identify which components of the subject under the test are more error-prone and thus demand more testing efforts. Recent development in machine learning techniques shows promising potential to predict faults in different components of a software system. This work conducts an empirical study to explore the feasibility of using static software metrics to predict software faults. We apply four machine learning techniques to construct fault prediction models from the PROMISE data set and evaluate the effectiveness of using static software metrics to build fault prediction models in four continuous versions of Apache Ant. The empirical results show that the combined software metrics generate the least misclassification errors. The fault prediction results vary significantly among different machine learning techniques and data set. Overall, fault prediction models built with the support vector machine (SVM) have the lowest misclassification errors.

Highlights

Testing is a crucial part of the software development life cycle [1]
This study explores what software metrics [11] are suitable for constructing fault prediction models and examine how well those machine learning models perform in predicting faults
We conduct an empirical study to examine the effectiveness of building fault prediction models with static software metrics

Summary

Introduction

Testing is a crucial part of the software development life cycle [1]. The purpose of testing is to expose all faults in the software system. A solid testing strategy can provide a high level of confidence about the correctness of an application after it has been deployed. Detecting faults in a system randomly may not be feasible [3] especially when dealing with large-scale projects. Practitioners (developers and testers) want to allocate resources in the most effective ways to find faults

Objectives

Results

Conclusion