Abstract

Software defect prediction (SDP), which predicts defective code regions, can help developers reasonably allocate limited resources for locating bugs and prioritizing their testing efforts. Previous work on defect prediction has used machine learning and artificial software metrics. However, traditional defect prediction features extracted from artificial software metrics often fail to capture the syntactic and semantic information of defective modules. This work on defect prediction mostly focuses on abstract syntax tree (AST). Moreover, because current research on AST technology is relatively mature, it is difficult to further improve the accuracy of defect prediction when only using AST to characterize codes. In this paper, in order to capture more semantic features, we extract semantic information both from the sequences of AST tokens and code change tokens. In addition, to leverage the traditional features extracted from statistical metrics, we also combine the semantic features with traditional defect prediction features to perform SDP, and use the gated fusion mechanism to determine the combination ratio of the two kinds of features. In our empirical studies, 10 open-source Java projects from the PROMISE repository are chosen as our empirical subjects. Experimental results show that our proposed approach can perform better than several state-of-the-art baseline SDP methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call