Exploiting spatial code proximity and order for improved source code retrieval for bug localization

Bunyamin Sisman,Shayan A Akbar,Avinash C Kak

doi:10.1002/smr.1805

Abstract

AbstractPractically all information retrieval based approaches developed to date for automatic bug localization are based on the bag‐of‐words assumption that ignores any positional and ordering relationships between the terms in a query. In this paper, we argue that bug reports are ill‐served by this assumption because such reports frequently contain various types of structural information whose terms must obey certain positional and ordering constraints. It therefore stands to reason that the quality of retrieval for bug localization would improve if these constraints could be taken into account when searching for the most relevant files. In this paper, we demonstrate that such is indeed the case. We show how the well‐known Markov Random Field based retrieval framework can be used for taking into account the term‐term proximity and ordering relationships in a query vis‐à‐vis the same relationships in the files of a source‐code library to greatly improve the quality of retrieval of the most relevant source files. We have carried out our experimental evaluations on popular large software projects using over 4000 bug reports. The results we present demonstrate unequivocally that the new proposed approach is far superior to the widely used bag‐of‐words based approaches. Copyright © 2016 John Wiley & Sons, Ltd.

Full Text