An Abstract Syntax Tree Encoding Method for Cross-Project Defect Prediction

Ziyi Cai,Lu Lu,Shaojian Qiu

doi:10.1109/access.2019.2953696

Ziyi Cai, Lu Lu + Show 1 more

Open Access

https://doi.org/10.1109/access.2019.2953696

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 25	License type: CC BY 4.0

Affiliation: South China University of Technology

Abstract

In the last few years, with the development of deep learning theory, researchers have tried to introduce the method of artificial intelligence into the field of software defect prediction (SDP) to improve its prediction effect. To be fed into the neural network, the sample codes are represented as an abstract syntax tree (AST), and the AST is encoded as real numbers. However, in most cross-project defect prediction (CPDP) task, the method for converting the AST into a real number cannot effectively estimate the semantic distance between the ASTs, resulting in a significant reduction in training effects. To solve that problem, we present a new encoding framework, tree-based-embedding (TBE), to convert AST into real vectors and make the semantic gap between the ASTs measurable. To estimate the effect of this encoding method, we promise a tree-based-embedding convolutional neural network with transferable hybrid feature learning (TBCNN-THFL) to perform the CPDP tasks. TBCNN-THFL is fed data encoded with TBE method for learning the transferable joint features between different projects; meanwhile, TBCNN-THFL introduces a transfer component analysis algorithm. Furthermore, the model combines the handcrafted and deep-learning-generated features and then feeds them into the classifier to train a defect prediction model. A sufficient number of experiments demonstrate that TBCNN-THFL is superior to referential models on 72 pairs of CPDP tasks formed by 9 open-source projects.

Highlights

In the process of developing and maintaining software, the scale and complexity of the software will increase, making the task of debugging more difficult
To exaggerate the transferability of hybrid features in cross-project defect prediction (CPDP) tasks, we introduce transfer component analysis (TCA), which could reduce the distance between different project data distributions and learn transfer components among projects in a reproducing kernel Hilbert space (RKHS)
THE PERFORMANCE OF TREE-BASED EMBEDDING METHOD (ANSWER FOR RQ1) To demonstrate that our tree-based embedding method can improve the performance of deep learning model in CPDP, we will compare the area under curve (AUC) of TBCNNTCA/TBCNN-THFL with models without TBE

Summary

Introduction

In the process of developing and maintaining software, the scale and complexity of the software will increase, making the task of debugging more difficult. Features for determining whether software is defective are divided into manually extracted features and. Extracted features are the features designed by researchers to distinguish between defect-prone code and bug-free code, (e.g, MOOD features [5] built on polymorphic factors, coupling factors, CK features [6] developed from function and inheritance counts, Halstead features [7] based on operation and operand counts, and McCabe features [8] based on dependencies). Machine learning models such as native Bayes (NB) [9], decision tree (DT) [10], [11] and support vector machine (SVM) [12], are fed the features describe above and trained to determine whether the code is defective. As deep learning has rapidly developed, many researchers [13]–[15] have begun to introduce deep learning into SDP, leveraging its powerful feature

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Abstract Syntax Tree Encoding Method for Cross-Project Defect Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Software defect prediction with semantic and structural information of codes based on Graph Neural Networks
Chunying Zhou ... Peng He
Information and Software Technology | VOL. 152
Chunying Zhou, et. al.Chunying Zhou ... Peng He
01 Dec 2022
Information and Software Technology | VOL. 152

A Suitable AST Node Granularity and Multi-Kernel Transfer Convolutional Neural Network for Cross-Project Defect Prediction
Jiehan Deng ... Shaojian Qiu
IEEE Access | VOL. 8
Jiehan Deng, et. al.Jiehan Deng ... Shaojian Qiu
01 Jan 2020
IEEE Access | VOL. 8

Use of Deep Learning Model with Attention Mechanism for Software Fault Prediction
Ting-Yan Yu ... Neil C Fang
-
Ting-Yan Yu, et. al.Ting-Yan Yu ... Neil C Fang
01 Aug 2021
01 Aug 2021

Software defect prediction via transfer learning based neural network
Qimeng Cao ... Qing Sun
-
Qimeng Cao, et. al.Qimeng Cao ... Qing Sun
01 Oct 2015
01 Oct 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Abstract Syntax Tree Encoding Method for Cross-Project Defect Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access