A variable-level automated defect identification model based on machine learning

Yuwei Zhang,Ying Xing,Feng Liu,Honghui Li,Dahai Jin,Yunzhan Gong

doi:10.1007/s00500-019-03942-3

Abstract

Static analysis tools, automatically detecting potential source code defects at an early phase during the software development process, are diffusely applied in safety-critical software fields. However, alarms reported by the tools need to be inspected manually by developers, which is inevitable and costly, whereas a large proportion of them are found to be false positives. Aiming at automatically classifying the reported alarms into true defects and false positives, we propose a defect identification model based on machine learning. We design a set of novel features at variable level, called variable characteristics, for building the classification model, which is more fine-grained than the existing traditional features. We select 13 base classifiers and two ensemble learning methods for model building based on our proposed approach, and the reported alarms classified as unactionable (false positives) are pruned for the purpose of mitigating the effort of manual inspection. In this paper, we firstly evaluate the approach on four open-source C projects, and the classification results show that the proposed model achieves high performance and reliability in practice. Then, we conduct a baseline experiment to evaluate the effectiveness of our proposed model in contrast to traditional features, indicating that features at variable level improve the performance significantly in defect identification. Additionally, we use machine learning techniques to rank the variable characteristics in order to identify the contribution of each feature to our proposed model.

Highlights

Software testing based on defect pattern (Quinlan et al 2007) is a source code static analysis technology developed in this century
We calculate a weighted average of the evaluation metrics except accuracy and kappa statistic referred in Sect. 5.4 across both classes (TRUE and FALSE), which is according to the number of null pointer dereference (NPD) inspection points (IPs) in each class
The weighted average (WA) is shown in Eq (9), where [M]T denotes the metric value of precision, recall, F-measure or area under ROC curve (AUC) for class TRUE, [M]F denotes the metric value of precision, recall, F-measure or AUC for class FALSE, AIP denotes the number of actionable NPD IPs, and UIP denotes the number of unactionable NPD IPs

Summary

Introduction

Software testing based on defect pattern (Quinlan et al 2007) is a source code static analysis technology developed in this century. For its high efficiency and accuracy, various static analysis tools, such as Coverity (Bessey et al 2010), PREfix (Bush et al 2000), Defect Testing System (DTS) (Yang et al 2008) and FindBugs (Ayewah and Pugh 2010), have been widely applied in automatically detecting potential source code defects at an early software development phase. One of the most crucial challenges involves false positives, which is a common problem of software testing based on defect pattern. A large scale of alarms reported by the tools are found to be false positives, which is inevitable (Dillig et al 2012). Manual inspection of the reported alarms would be a costly and unavoidable work for developers

Objectives

Methods

Results

Conclusion