Abstract

Given a bug report, bug localization technique can help developers automatically locate potential buggy files. Information retrieval and deep learning approaches have been applied in bug localization by extracting lexical features in bug reports and syntactic features in source code files, though they fail to utilize the structural and semantic information of source code files. In this paper, we present a bug localization system CAST, which exploits deep learning and customized abstract syntax trees of programs to locate potential buggy source files automatically and effectively. Specifically, CAST extracts both lexical semantics in bug reports (e.g., words) and source files (e.g., method names) and program semantics in source files (e.g., abstract syntax tree, AST). Moreover, CAST enhances the tree-based convolutional neural network (TBCNN) model with customized ASTs, which distinguish between user-defined methods and system-provided ones to reflect their contributions leading to defects. Furthermore, customized ASTs group the syntactic entities with similar semantics and prune the ones with little or redundant semantics in order to facilitate the learning performance. Experimental results on four widely-used software projects show that CAST significantly outperforms the state-of-the-art methods in locating the buggy source files.

Highlights

  • For large and evolving software, developers may receive a large number of bug reports, and it is difficult and costly to manually locate the potential buggy source files based on bug reports

  • EVALUATION To evaluate the performance of CAST, we focus on four research questions (RQ) as follows: RQ1 What effect do the different model settings have on CAST? When building CAST, we need to determine the suitable values of hyper-parameters

  • RQ2 Can CAST outperform other bug localization methods? To evaluate the capability of CAST, we compare CAST with four state-of-the-art tools in bug localization (BugLocator [28], DNNLOC [29], DeepLocator [35], NP-convolutional neural network (CNN) [22])

Read more

Summary

INTRODUCTION

For large and evolving software, developers may receive a large number of bug reports, and it is difficult and costly to manually locate the potential buggy source files based on bug reports. Liang et al.: Deep Learning With Customized Abstract Syntax Tree for Bug Localization These tools can model natural and programming language for bug localization, there is room for improvement on accuracy and performance. CAST leverages CNN to extract rich lexical semantic features, which indicate the relationship between syntactic entities, e.g. words or methods in bug reports and source files, and exploits TBCNN [9] on customized ASTs to capture hierarchical structure features, which contain the structural or semantic relation of program statements in source code files. It differentiates user-defined methods and system-provided ones to reflect their contributions leading to defects, which is helpful to improve the accuracy of bug localization It groups the syntactic entities with similar semantics and prunes the ones with little or redundant semantics to facilitate the learning performance.

MOTIVATION
WORD EMBEDDING
FEATURE EXTRACTION
FEATURE COMBINATION
OPTIMIZATION FUNCTION
EVALUATION
EXPERIMENTAL RESULTS AND ANALYSIS Answer to RQ1
THREATS TO VALIDITY
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call