Abstract

Feature Location (FL) aims to locate observable functionalities in source code. Considering its key role in software maintenance, a vast array of automated and semi-automated Feature Location Techniques (FLTs) have been proposed. To compare FLTs, an open, standard set of non-subjective, reproducible “compare-to” FLT techniques (baseline techniques) should be used for evaluation. In order to relate the performance of FLTs compared against different baseline techniques, these compare-to techniques should be evaluated against each other. But evaluation across FLTs is confounded by empirical designs that incorporate different FL goals and evaluation criteria. This paper moves towards standardizing FLT comparability by assessing eight baseline techniques in an empirical design that addresses these confounding factors. These baseline techniques are assessed in twelve case studies to rank their performance. Results of the case studies suggest that different baseline techniques perform differently and that VSM-Lucene and LSI-Matlab performed better than other implementations. By presenting the relative performances of baseline techniques this paper facilitates empirical cross-comparison of existing and future FLTs. Finally, the results suggest that the performance of FLTs partially depends on system/benchmark characteristics, in addition to the FLTs themselves.

Highlights

  • A feature is an observable functionality in a software system that can be triggered by the user (Eisenbarth et al 2003)

  • Natural Language Processing (NLP), Information Retrieval (IR) and Pattern Matching (PM) are the main analysis techniques employed in textual analysis (Binkley et al 2015; Diaz et al 2013; Liu et al 2007) with the emphasis on IR, as it is more effective than PM while being less complex than NLP (Wang et al 2011)

  • We argue that only by relative comparison against open, standard baseline techniques, under common evaluation measures, and standard empirical-design conditions, will researchers begin to identify the high-performing Feature Location Techniques (FLTs) in the field

Read more

Summary

Introduction

A feature is an observable functionality in a software system that can be triggered by the user (Eisenbarth et al 2003). Influential works include Chen and Rajlich (2000), whose technique achieved FL through the examination of the software’s structure via a dependency graph, Wilde et al (2001), who used program traces gathered during dynamic analysis, and Antoniol et al (2002), who used an Information Retrieval (IR) technique (textual analysis) to support the feature location task From these early efforts, the number of structural and textual analysis approaches for FL has expanded dramatically and many new FLTs have been developed (Chen and Rajlich 2000; Antoniol et al 2002; Lukins et al 2008; Marcus and Maletic 2003; Marcus et al 2004; Starke et al 2009) tailored to different software maintenance activities (Cornelissen et al 2009). Natural Language Processing (NLP), Information Retrieval (IR) and Pattern Matching (PM) are the main analysis techniques employed in textual analysis (Binkley et al 2015; Diaz et al 2013; Liu et al 2007) with the emphasis on IR, as it is more effective than PM while being less complex than NLP (Wang et al 2011)

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call