Improving Bug Localization With Effective Contrastive Learning Representation

Zhengmao Luo,Caichun Cen,Wenyao Wang

doi:10.1109/access.2022.3228802

Abstract

Automated localization of buggy files can accelerate developers’ efficiency of software maintenance, improving the quality of software products. State-of-the-art approaches for bug localization is based on neural networks, e.g., RNN or CNN, and can learn semantic feature from the given bug report. However, these simple neural architectures are difficult to learn the deep contextual feature from bug reports, which hurts the semantic mapping between bug reports and their corresponding buggy files. To resolve the above problem, in this paper we propose a bug localization approach that combines pre-trained language models and contrastive learning, namely CoLoc. Specifically, CoLoc first is pre-trained on a large-scale bug report corpus in an unsupervised way, to learn the deep contextual feature of each token in the bug report according to its context. Afterward, CoLoc is further pre-trained by a contrastive learning objective to learn the contrastive learning representations both of bug reports and buggy files. Contrastive learning can help CoLoc to learn the semantic differences between different bug reports and buggy files. To evaluate the effectiveness of CoLoc, we choose five baseline approaches and compare their performance on a public dataset. The experimental results show that CoLoc outperforms all baseline approaches by up to 76.00% in terms of MRR, achieving new results for bug localization.

Full Text