Locating Faulty Methods with a Mixed RNN and Attention Model

Shouliang Yang,Beijun Shen,Hushuang Zeng,Hao Zhong,Junming Cao

doi:10.1109/icpc52881.2021.00028

Abstract

IR-based fault localization approaches achieves promising results when locating faulty files by comparing a bug report with source code. Unfortunately, they become less effective to locate faulty methods. We conduct a preliminary study to explore its challenges, and identify three problems: the semantic gap problem, the representation sparseness problem, and the single revision problem.To tackle these problems, we propose MRAM, a mixed RNN and attention model, which combines bug-fixing features and method structured features to explore both implicit and explicit relevance between methods and bug reports for method level fault localization task. The core ideas of our model are: (1) constructing code revision graphs from code, commits and past bug reports, which reveal the latent relations among methods to augment short methods and as well provide all revisions of code and past fixes to train more accurate models; (2) embedding three method structured features (token sequences, API invocation sequences, and comments) jointly with RNN and soft attention to represent source methods and obtain their implicit relevance with bug reports; and (3) integrating multi-revision bug-fixing features, which provide the explicit relevance between bug reports and methods, to improve the performance.We have implemented MRAM and conducted a controlled experiment on five open-source projects. Comparing with state-of-the-art approaches, our MRAM improves MRR values by 3.8-5.1% (3.7-5.4%) when the dataset contains (does not contain) localized bug reports. Our statistics test shows that our improvements are significant.

Full Text