Predicting Defective Lines Using a Model-Agnostic Technique

Supatsara Wattanakriengkrai,Kenichi Matsumoto,Hideaki Hata,Patanamon Thongtanunam,Chakkrit Tantithamthavorn

doi:10.1109/tse.2020.3023177

Supatsara Wattanakriengkrai, Kenichi Matsumoto + Show 3 more

Open Access

https://doi.org/10.1109/tse.2020.3023177

Copy DOI

Abstract

Defect prediction models are proposed to help a team prioritize the areas of source code files that need Software Quality Assurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole file while only a small fraction of its source code lines are defective. Indeed, we find that as little as 1-3 percent of lines of a file are defective. Hence, in this work, we propose a novel framework (called <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Line-DP ) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Line-DP first builds a file-level defect model using code token features. Then, our <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Line-DP uses a state-of-the-art model-agnostic technique (i.e., LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Line-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20%LOC recall of 0.27, and an initial false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Line-DP requires an average computation time of 10 seconds including model construction and defective line identification time. In addition, we find that 63 percent of defective lines that can be identified by our <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Line-DP are related to common defects (e.g., argument change, condition change). These results suggest that our <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Line-DP can effectively identify defective lines that contain common defects while requiring a smaller amount of inspection effort and a manageable computation cost. The contribution of this paper builds an important step towards line-level defect prediction by leveraging a model-agnostic technique.

Highlights

S Oftware Quality Assurance (SQA) is one of software engineering practices for ensuring the quality of a software product [26]
We propose a novel line-level defect prediction framework which leverages a model-agnostic technique to predict defective lines, i.e., the source code lines that will be changed by bug-fixing commits to fix post-release defects
This result suggests that when comparing with the traditional approach of predicting defects at the file level, our LINE-DP could potentially help developers reduce Software Quality Assurance (SQA) effort that will be spent on 52% of clean lines, while 62% of defective lines will be examined

Summary

Introduction

S Oftware Quality Assurance (SQA) is one of software engineering practices for ensuring the quality of a software product [26]. When changed files from the cuttingedge development branches will be merged into the release branch where the quality is strictly controlled, an SQA team needs to carefully analyze and identify software defects in those changed files [1]. Defect prediction models are proposed to help SQA teams prioritize their effort by analyzing post-release software defects that occur in the previous release [16, 26, 55, 59, 77, 80]. Release preparation embedded as a quality culture throughout the life cycles from planning, development stage, to release preparation so teams can follow the best practices to prevent software defects.

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Software Engineering	Publication Date: Sep 8, 2020
Citations: 137	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Predicting Defective Lines Using a Model-Agnostic Technique

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Software Engineering

Lead the way for us

Similar Papers

DeepLineDP: Towards a Deep Learning Approach for Line-Level Defect Prediction
Chanathip Pornprasit ... Chakkrit Kla Tantithamthavorn
IEEE Transactions on Software Engineering | VOL. 49
Chanathip Pornprasit, et. al.Chanathip Pornprasit ... Chakkrit Kla Tantithamthavorn
01 Jan 2023
IEEE Transactions on Software Engineering | VOL. 49

Learning Semantic Features for Software Defect Prediction by Code Comments Embedding
Xuan Huo ... Ming Li
-
Xuan Huo, et. al.Xuan Huo ... Ming Li
01 Nov 2018
01 Nov 2018

Bidirectional Recurrent Neural Network Language Model: Cross Entropy Churn Metrics for Defect Prediction Modeling
Nivetha R ... Kavitha S
International Journal of Data Mining Techniques and Applications | VOL. 9
Nivetha R, et. al.Nivetha R ... Kavitha S
10 Dec 2020
International Journal of Data Mining Techniques and Applications | VOL. 9

JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction
Chanathip Pornprasit ... Chakkrit Kla Tantithamthavorn
-
Chanathip Pornprasit, et. al.Chanathip Pornprasit ... Chakkrit Kla Tantithamthavorn
01 May 2021
01 May 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting Defective Lines Using a Model-Agnostic Technique

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Software Engineering