The Impact of Retrieval Direction on IR-Based Traceability Link Recovery

Chris Mills,Sonia Haiduc

doi:10.1109/icse-nier.2017.14

Abstract

The application of Information Retrieval (IR) techniques to software traceability link recovery has been the focus of many studies. These studies have formulated the task of establishing valid trace links between two types of software artifacts as a retrieval problem, where one type of artifacts is selected as the set of queries and the other as the corpus. Previous work selected the sets of queries and corpus artifacts for a study up front, therefore pre-imposing a retrieval direction for finding all trace links. This decision was usually made based on intuition or previous work. We argue that the choice of the query and corpus sets (i.e., retrieval direction) can significantly impact the results of IR-based traceability link recovery and should be made with context in mind, as the best choice may be dependent on the properties of each dataset. More than that, we argue that even within the same system, different traceability links may be best recovered by using different retrieval directions. In this paper we provide the first evidence to support these claims, showing that retrieval direction can have a significant impact on IR performance for traceability link recovery at both the project and individual link level. Moreover, we propose future research directions aimed at predicting the most efficient retrieval direction, as well as approaches leveraging information from both retrieval directions simultaneously.

Full Text