Abstract

Research on software defect prediction has achieved great success at modeling predictors. To build more accurate predictors, a number of hand-crafted features are proposed, such as static code features, process features, and social network features. Few models, however, consider the semantic and structural features of programs. Understanding the context information of source code files could explain a lot about the cause of defects in software. In this paper, we leverage representation learning for semantic and structural features generation. Specifically, we first extract token vectors of code files based on the Abstract Syntax Trees (ASTs) and then feed the token vectors into Convolutional Neural Network (CNN) to automatically learn semantic features. Meanwhile, we also construct a complex network model based on the dependencies between code files, namely, software network (SN). After that, to learn the structural features, we apply the network embedding method to the resulting SN. Finally, we build a novel software defect prediction model based on the learned semantic and structural features (SDP-S2S). We evaluated our method on 6 projects collected from public PROMISE repositories. The results suggest that the contribution of structural features extracted from software network is prominent, and when combined with semantic features, the results seem to be better. In addition, compared with the traditional hand-crafted features, the F-measure values of SDP-S2S are generally increased, with a maximum growth rate of 99.5%. We also explore the parameter sensitivity in the learning process of semantic and structural features and provide guidance for the optimization of predictors.

Highlights

  • Software defect is an error in the code or incorrect behavior in software execution, defined as failure to meet intended or specified requirements

  • Our contributions to the current state of research are summarized as follows: (i) We further demonstrated that the automatically learned semantic features can significantly improve defect prediction compared to traditional features (ii) In terms of improving the performance of defect prediction, we validated that the contribution of structural features extracted from software network by representation learning is comparable to that of semantic features on the whole (iii) Interestingly, we found that the combination of semantic and structural features has greater impact on the improvement of prediction performance e rest of this paper is organized as follows

  • Nam et al [30] proposed TCA+, which adopted a state-of-the-art technique called Transfer Component Analysis (TCA) and optimized normalization process. ey evaluated TCA+ on eight opensource projects, and the results showed that TCA+ significantly improved cross-project defect prediction (CPDP)

Read more

Summary

Introduction

Software defect is an error in the code or incorrect behavior in software execution, defined as failure to meet intended or specified requirements. If we use traditional features to represent these two files, they are identical because of the same source code characteristics in terms of lines of code, function calls, raw programming tokens, etc They are quite different according to semantic information. To make use of its powerful feature generation ability, some researchers [8, 9] have already leveraged deep learning algorithms, such as Deep Belief Network (DBN) and Convolutional Neural Network (CNN) in learning semantic features from programs’ ASTs, and verified that it outperforms traditional hand-crafted features in defect prediction. Erefore, using representation learning to extract the structural information from code files and further apply the learned features to defect prediction may effectively improve the performance of existing prediction models.

Related Studies
Preliminaries
Approach
Experiment Setup
Experimental Results
Threats to Validity
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call