Abstract

Software quality plays an important role in the software engineering. To improve software reliability, software defect prediction which predicts code region is buggy or not aims to assist developers find bugs and allocate their test resources reasonably. Traditional defect prediction studies focus on handcrafted features and train machine learning classifiers to identify the defective code area. Since the source code contains rich semantic and structural information, and defective code is closely related to its context, now many researchers try to utilize semantic information of programs and build more accurate defect prediction model. Existing methods mainly analyze Abstract Syntax Tree (AST) which is obtained from source code and transform it into token sequences. But these methods fail to capture rich information of structural and context information of code. At the same time, some traditional features(e.g. The number of methods in the class(WMC), Lines of code(LOC)) are also very valuable in defect prediction. Therefore in this paper, we propose to a method to build the graph by connecting parent and child nodes of the AST and connecting leaf nodes from left to right to preserve the context information that contained in source code. Then we use Graph Convolutional Network(GCN) to automatically generate semantic and structural features from source code. In addition, we combine GCN learned features with handcrafted features for more accurate prediction model. Based on the above model, our evaluation on 12 data sets shows that our proposed method is superior to the state-of-the-art software defect prediction method by 10.37 % in recall, 1.62 % in precision, 5.15% in F-measure, and 8.11 % in MCC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call