Seml: A Semantic LSTM Model for Software Defect Prediction

Hongliang Liang,Lin Jiang,Yue Yu,Zhuosi Xie

doi:10.1109/access.2019.2925313

Hongliang Liang, Lin Jiang + Show 2 more

Open Access

https://doi.org/10.1109/access.2019.2925313

Copy DOI

Abstract

Software defect prediction can assist developers in finding potential bugs and reducing maintenance cost. Traditional approaches usually utilize software metrics (Lines of Code, Cyclomatic Complexity, etc.) as features to build classifiers and identify defective software modules. However, software metrics often fail to capture programs’ syntax and semantic information. In this paper, we propose Seml, a novel framework that combines word embedding and deep learning methods for defect prediction. Specifically, for each program source file, we first extract a token sequence from its abstract syntax tree. Then, we map each token in the sequence to a real-valued vector using a mapping table, which is trained with an unsupervised word embedding model. Finally, we use the vector sequences and their labels (defective or non-defective) to build a Long Short Term Memory (LSTM) network. The LSTM model can automatically learn the semantic information of programs and perform defect prediction. The evaluation results on eight open source projects show that Seml outperforms three state-of-the-art defect prediction approaches on most of the datasets for both within-project defect prediction and cross-project defect prediction.

Highlights

Software defect prediction techniques are proposed to improve software reliability and reduce software development cost
Several machine learning models have been adopted as defect prediction classifiers, such as Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), Neural Network (NN), etc
We present a preprocessing method for tokens extracted from programs’ Abstract Syntax Trees (ASTs) and train a word embedding model in an unsupervised way to map tokens as real-valued vectors, in order to capture semantic similarities of tokens for both within-project defect prediction (WPDP) and cross-project defect prediction (CPDP)

Summary

INTRODUCTION

Software defect prediction techniques are proposed to improve software reliability and reduce software development cost. Most previous studies leverage manually designed software metrics to build classifiers. Traditional approaches with metrics have made progress in both within-project defect prediction and cross-project defect prediction. They are facing a challenge that manually designed metrics fail to capture programs’ rich syntax and semantic information, which may limit the performance of defect prediction. The two files share the same metrics (Lines of Code, Cyclomatic Complexity, etc.) and traditional defect prediction approaches can’t tell the difference between them. To capture programs’ syntax and semantic information, Wang et al [11] proposed a deep learning approach, which leverages Deep Belief Network (DBN) [12] to learn semantic features from token sequences extracted from programs’

DEFECT PREDICTION

WORD EMBEDDING

APPROACH

PARSING SOURCE CODE AND EXTRACTING FEATURES

TOKEN EMBEDDING

BUILDING LSTM MODEL AND PERFORMING DEFECT

EVALUATION

DATASETS

EVALUATION METRICS

BASELINES

PARAMETERS TUNING

SOFTWARE DEFECT PREDICTION

DEEP LEARNING AND SOFTWARE ENGINEERING

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 72	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Seml: A Semantic LSTM Model for Software Defect Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Deep Semantic Feature Learning for Software Defect Prediction
Song Wang ... Jaechang Nam
IEEE Transactions on Software Engineering | VOL. 46
Song Wang, et. al.Song Wang ... Jaechang Nam
01 Dec 2020
IEEE Transactions on Software Engineering | VOL. 46

Analysis and modeling conditional mutual dependency of metrics in software defect prediction using latent variables
Nima Shiri Harzevili ... Sasan H Alizadeh
Neurocomputing | VOL. 460
Nima Shiri Harzevili, et. al.Nima Shiri Harzevili ... Sasan H Alizadeh
24 Jul 2021
Neurocomputing | VOL. 460

Research Progress of Machine Learning in Software Defect Prediction
Fanqi Meng ... Jingdong Wang
-
Fanqi Meng, et. al.Fanqi Meng ... Jingdong Wang
08 Apr 2023
08 Apr 2023

Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks
Hongwei Tao ... Qiaoling Cao
IET Software | VOL. 2024
Hongwei Tao, et. al.Hongwei Tao ... Qiaoling Cao
18 Mar 2024
IET Software | VOL. 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Seml: A Semantic LSTM Model for Software Defect Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access