Abstract

In order to improve software reliability, software defect prediction is applied to the process of software maintenance to identify potential bugs. Traditional methods of software defect prediction mainly focus on designing static code metrics, which are input into machine learning classifiers to predict defect probabilities of the code. However, the characteristics of these artificial metrics do not contain the syntactic structures and semantic information of programs. Such information is more significant than manual metrics and can provide a more accurate predictive model. In this paper, we propose a framework called defect prediction via attention-based recurrent neural network (DP-ARNN). More specifically, DP-ARNN first parses abstract syntax trees (ASTs) of programs and extracts them as vectors. Then it encodes vectors which are used as inputs of DP-ARNN by dictionary mapping and word embedding. After that, it can automatically learn syntactic and semantic features. Furthermore, it employs the attention mechanism to further generate significant features for accurate defect prediction. To validate our method, we choose seven open-source Java projects in Apache, using F1-measure and area under the curve (AUC) as evaluation criteria. The experimental results show that, in average, DP-ARNN improves the F1-measure by 14% and AUC by 7% compared with the state-of-the-art methods, respectively.

Highlights

  • With the continuous expansion of modern software, software reliability has become a key concern. e complex source code of software tends to cause software defects which may lead to software failure

  • Previous research focuses on designing discriminative artificial metrics to achieve higher model accuracy. ese manual metrics are mainly divided into Halstead features [7] based on the number of operators and Scientific Programming operands, dependency-based McCabe features [8], and CK features [9] based on object-oriented programs

  • We evaluate the performance of our model as F1-measure and area under the curve (AUC)

Read more

Summary

Introduction

With the continuous expansion of modern software, software reliability has become a key concern. e complex source code of software tends to cause software defects which may lead to software failure. Software defect prediction [2, 3] is a process of constructing machine learning classifiers to predict defective code snippets, using historical information in software repositories such as code complexity and change records to design software defect metrics [4]. E predicted results can assist developers to locate and fix potential defects, thereby improving software stability and reliability. According to whether source data and target data are homogeneous or heterogeneous, software defect prediction can be divided into within-project software defect prediction [5] and crossproject software defect prediction [6]. We focus on within-project software defect prediction. Traditional defect prediction methods mainly consist of two stages: extracting software metrics from historical repositories and constructing a machine learning model for classification. Previous research focuses on designing discriminative artificial metrics to achieve higher model accuracy. ese manual metrics are mainly divided into Halstead features [7] based on the number of operators and Scientific Programming operands, dependency-based McCabe features [8], and CK features [9] based on object-oriented programs

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call