Cross-language bug localization

Xin Xia,Xinyu Wang,Xingen Wang,David Lo,Chenyi Zhang

doi:10.1145/2597008.2597788

Abstract

Bug localization refers to the process of identifying source code files that contain defects from textual descriptions in bug reports. Existing bug localization techniques work on the assumption that bug reports, and identifiers and comments in source code files, are written in the same language (i.e., English). However, software users from non-English speaking countries (e.g., China) often use their native languages (e.g., Chinese) to write bug reports. For this setting, existing studies on bug localization would not work as the terms that appear in the bug reports do not appear in the source code. We refer to this problem as cross-language bug localization. In this paper, we propose a cross-language bug localization algorithm named CrosLocator, which is based on language translation. Since different online translators (e.g., Google and Microsoft translators) have different translation accuracies for various texts, CrosLocator uses multiple translators to convert a non-English textual description of a bug report into English -- each bug report would then have multiple translated versions. For each translated version, CrosLocator applies a bug localization technique to rank source code files. Finally, CrosLocator combines the multiple ranked lists of source code files. Our preliminary experiment on Ruby-China shows that CrosLocator could achieve mean reciprocal rank (mrr) and mean average precision (map) scores of up to 0.146 and 0.116, which outperforms a baseline approach by an average of 10% and 12% respectively.

Full Text