Simplified abstract syntax tree based semantic features learning for software change prediction

Xinyue Yang,Xiaofang Zhang,Yao Tong

doi:10.1002/smr.2445

Abstract

AbstractSoftware change prediction aims to identify the change‐prone parts of source code, which can help software practitioners allocate resources more efficiently, increase the quality of software products, and reduce maintenance costs. In recent years, researchers have built many change prediction models based on product and process metrics using traditional classification algorithms. However, source code contains rich semantic structural information, which traditional features cannot usually capture. Therefore, extracting the semantic features of code can help improve the performance of existing models. To bridge the gap between semantic features and change prediction, we introduce a novel change prediction approach based on a simplified abstract syntax tree (AST). Specifically, we first extract semantic features from partial AST nodes that pay attention to the syntax and semantic of code instead of all AST nodes. Then, a bidirectional recurrent neural network is utilized to model the deep semantic information of the code for change prediction. We also propose a new dataset that to some extent alleviates the data‐imbalance problem, which has become an active research topic. We conducted extensive experiments on the proposed dataset. The results show the effectiveness of semantic features for change prediction. Further, our model outperformed a state‐of‐the‐art code representation method.

Full Text