Convolutional Neural Networks-Based Locating Relevant Buggy Code Files for Bug Reports Affected by Data Imbalance

Guangliang Liu,Yang Lu,Xing Wei,Ke Shi,Jingfei Chang

doi:10.1109/access.2019.2940557

Guangliang Liu, Yang Lu + Show 3 more

Open Access

https://doi.org/10.1109/access.2019.2940557

Copy DOI

Abstract

Software bug localization is very important in software engineering, but it is also complicated and time consuming. To improve the efficiency of developers, researchers have developed various traditional bug localization and machine learning bug localization methods. In this paper, we propose a novel method that improves bug localization performance. First, surface lexical correlation matching between bug reports and source code files is used to obtain features by deep neural network. Second, to solve the lexical gap between bug reports and source code files, semantic correlation matching between them is used to obtain features based on word embedding and sentence embedding by deep neural network. Then, the joint features obtained by the surface lexical and semantic correlation matching are fused into a unified feature representation for bug reports and source code files. In addition, since our experimental datasets are imbalanced data, we use a focal loss function to solve the impact of data imbalance. Finally, our method obtains the relatively high bug localization performance compared to other classic methods.

Highlights

Software defect fixes in the software lifecycle have always been very important
Xiao et al [13] presented the DeepLoc method, which obtains vector of bug reports and source code files by word2vec and sent2vec to locate relevant buggy files by convolutional neural networks. In these bug localization methods based on deep learning, some consider surface lexical correlation matching between bug reports and source code files, regardless of their semantic correlation matching
The main contributions of our work are as follows: 1. We propose a novel method called joint surface lexical and semantic correlation matching based on convolutional neural networks (SLS-CNN) for bug localization

Summary

INTRODUCTION

Software defect fixes in the software lifecycle have always been very important. General software defects are fed back to the development team by bug reports, and the development team fixes the bugs based on the reports. Wang and Lo [7] proposed the AmaLgam+ method, which uses similar report, structure, and other information to locate relevant buggy files These traditional methods use feature attributes of bug reports and source code files for bug localization. Xiao et al [13] presented the DeepLoc method, which obtains vector of bug reports and source code files by word2vec and sent2vec to locate relevant buggy files by convolutional neural networks. In these bug localization methods based on deep learning, some consider surface lexical correlation matching between bug reports and source code files, regardless of their semantic correlation matching.

BACKGROUND

WORD2VEC AND DOC2VEC

FOCAL LOSS

MODULE 3 –FEATURES FUSION LAYER

BENCHMARK DATASETS

THREATS TO VALIDITY

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Convolutional Neural Networks-Based Locating Relevant Buggy Code Files for Bug Reports Affected by Data Imbalance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Mapping Bug Reports to Relevant Source Code Files Based on the Vector Space Model and Word Embedding
Guangliang Liu ... Yang Lu
IEEE Access | VOL. 7
Guangliang Liu, et. al.Guangliang Liu ... Yang Lu
01 Jan 2019
IEEE Access | VOL. 7

Cross-language bug localization
David Lo ... Xinyu Wang
-
David Lo, et. al.David Lo ... Xinyu Wang
02 Jun 2014
02 Jun 2014

Enhancing Bug Localization Using Phase-Based Approach
Hesham A Hassan ... Amr Mansour Mohsen
IEEE Access | VOL. 11
Hesham A Hassan, et. al.Hesham A Hassan ... Amr Mansour Mohsen
01 Jan 2023
IEEE Access | VOL. 11

A Similarity Integration Method based Information Retrieval and Word Embedding in Bug Localization
Arif Ali Khan ... Shasha Cheng
-
Arif Ali Khan, et. al.Arif Ali Khan ... Shasha Cheng
01 Dec 2020
01 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Convolutional Neural Networks-Based Locating Relevant Buggy Code Files for Bug Reports Affected by Data Imbalance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access