On the influence of program constructs on bug localization effectiveness

Marcelo Garnier,Isabella Ferreira,Alessandro Garcia

doi:10.1186/s40411-017-0040-2

Marcelo Garnier, Isabella Ferreira + Show 1 more

Open Access

https://doi.org/10.1186/s40411-017-0040-2

Copy DOI

Abstract

Software projects often reach hundreds or thousands of files. Therefore, manually searching for code elements that should be changed to fix a failure is a difficult task. Static bug localization techniques provide cost-effective means of finding files related to the failure described in a bug report. Structured information retrieval (IR) has been successfully applied by techniques such as BLUiR, BLUiR+, and AmaLgam. However, there are significant shortcomings on how these techniques were evaluated. First, virtually all evaluations have been limited to very few projects written in only one object-oriented programming language, particularly Java. Second, it might be that particular constructs of different programming languages, such as C#, play a role on the effectiveness of bug localization techniques. However, little is known about this phenomenon. Third, the experimental setup for most of the bug localization studies make simplistic assumptions that do not hold on real-world scenarios, thereby raising doubts about the reported effectiveness of existing techniques. In this article, we evaluate BLUiR, BLUiR+, and AmaLgam on 20 C# projects, addressing the aforementioned shortcomings from previous studies. Then, we extend AmaLgam’s algorithm to understand if structured information retrieval can benefit from the use of a wider range of program constructs, including C# constructs inexistent in Java. We also perform an analysis of the influence of program constructs to bug localization effectiveness using Principal Component Analysis (PCA). Our analysis points to Methods and Classes as the constructs that contribute the most to the effectiveness of bug localization. It also reveals a significant contribution from Properties and String literals, constructs not considered in previous studies. Finally, we evaluate the effects of changing the emphasis on particular constructs by making another extension to AmaLgam’s algorithm, enabling the specification of different weights for each construct. Our results show that fine-tuning these weights may increase the effectiveness of bug localization in projects structured with a specific programming language, such as C#.

Highlights

Software defects are a serious concern for developers and maintainers
5 Conclusion Structured information retrieval has been successfully applied to the bug localization problem
Considering the multi-language nature of most modern software (Karus and Gall 2011), it is important to have effective bug localization models for the different kinds of languages and technologies used in software projects

Summary

Introduction

Software defects (bugs) are a serious concern for developers and maintainers. It is widely known that the later a failure is detected, higher the cost to fix it. The activity of finding the defective source code elements that led to a failure is called bug localization (Lukins et al 2010). Effective methods for automatically locating bugs from bug reports are Garnier et al Journal of Software Engineering Research and Development (2017) 5:6 highly desirable (Saha et al 2013), as they would shorten bug-fixing time, reducing software maintenance costs (Zhou et al 2012). When applying IR to bug localization, source code files become the collection of documents, and the bug report represents the query. Text normalization extracts a list of terms that represents the documents and the query, by removing punctuation marks, performing case folding, and splitting identifiers. Stemming converts each term to a common root form, to improve term matching by representing similar words with the same term

Results

Discussion

Conclusion