Automating Change-Level Self-Admitted Technical Debt Determination

Meng Yan,Xiaohu Yang,David Lo,Emad Shihab,Jianwei Yin,Xin Xia

doi:10.1109/tse.2018.2831232

Abstract

Technical debt (TD) is a metaphor to describe the situation where developers introduce suboptimal solutions during software development to achieve short-term goals that may affect the long-term software quality. Prior studies proposed different techniques to identify TD, such as identifying TD through code smells or by analyzing source code comments. Technical debt identified using comments is known as Self-Admitted Technical Debt (SATD) and refers to TD that is introduced intentionally. Compared with TD identified by code metrics or code smells, SATD is more reliable since it is admitted by developers using comments. Thus far, all of the state-of-the-art approaches identify SATD at the file-level. In essence, they identify whether a file has SATD or not. However, all of the SATD is introduced through software changes. Previous studies that identify SATD at the file-level in isolation cannot describe the TD context related to multiple files. Therefore, it is beneficial to identify the SATD once a change is being made. We refer to this type of TD identification as “Change-level SATD Determination”, which determines whether or not a change introduces SATD. Identifying SATD at the change-level can help to manage and control TD by understanding the TD context through tracing the introducing changes. To build a change-level SATD Determination model, we first identify TD from source code comments in source code files of all versions. Second, we label the changes that first introduce the SATD comments as TD-introducing changes. Third, we build the determination model by extracting 25 features from software changes that are divided into three dimensions, namely diffusion, history and message, respectively. To evaluate the effectiveness of our proposed model, we perform an empirical study on 7 open source projects containing a total of 100,011 software changes. The experimental results show that our model achieves a promising and better performance than four baselines in terms of AUC and cost-effectiveness (i.e., percentage of TD-introducing changes identified when inspecting 20 percent of changed LOC). On average across the 7 experimental projects, our model achieves AUC of 0.82, cost-effectiveness of 0.80, which is a significant improvement over the comparison baselines used. In addition, we found that “Diffusion” is the most discriminative dimension among the three dimensions of features for determining TD-introducing changes.

Full Text