On the use of positional proximity in IR-based feature location

Emily Hill,Avinash Kak,Bunyamin Sisman

doi:10.1109/csmr-wcre.2014.6747185

Abstract

As software systems continue to grow and evolve, locating code for software maintenance tasks becomes increasingly difficult. Recently proposed approaches to bug localization and feature location have suggested using the positional proximity of words in the source code files and the bug reports to determine the relevance of a file to a query. Two different types of approaches have emerged for incorporating word proximity and order in retrieval: those based on ad-hoc considerations and those based on Markov Random Field (MRF) modeling. In this paper, we explore using both these types of approaches to identify over 200 features in five open source Java systems. In addition, we use positional proximity of query words within natural language (NL) phrases in order to capture the NL semantics of positional proximity. As expected, our results indicate that the power of these approaches varies from one dataset to another. However, the variations are larger for the ad-hoc positional-proximity based approaches than with the approach based on MRF. In other words, the feature location results are more consistent across the datasets with MRF based modeling of the features.

Full Text