Abstract

Self-admitted technical debt (SATD) is annotated in source code comments by developers and has been recognized as a great source of discovering flawed software. To reduce manual effort, some recent studies have focused on automated detection of SATD using text classification methods. To train their classifier, these methods need labeled samples, which also require a lot of effort to obtain. We developed a new SATD identification method, which takes advantage of a large corpus of unlabeled code comments, extracted from open source projects, to train a word embedding model. After applying feature selection, the pre-trained word embedding is used for discovering semantically similar features in source code comments to enhance the original feature set. By using such enhanced feature set for classification, our goal was to improve the identification of SATD when compared to existing methods. The proposed feature enhancement method was used with the three most common feature selection methods (CHI, IG, and MI), and three well-known text classification algorithms (NB, SVM, and ME) and was tested on ten open source projects. The experimental results show a significant improvement in SATD identification over the compared methods. With an achieved 82% of correct predictions of SATD, the proposed method seems to be a good candidate to be adopted in practice.

Highlights

  • In software development, there has always been a trade-off between software code quality and timely software release [2]

  • Encouraged by the promising results, in this paper we present a new, expanded method for Self-admitted technical debt (SATD) detection from source code comments, which is based on enhanced feature selection using word embedding, and discuss the obtained results in detail

  • SATD is technical debt that is described in source code comments, and is introduced intentionally by developers

Read more

Summary

Introduction

There has always been a trade-off between software code quality and timely software release [2]. SATD is technical debt that is described in source code comments, and is introduced intentionally by developers. It was first described by Potdar and Shihab [29]. In their study, they examined four open source projects manually and extracted textual patterns which were likely indicating SATD comments. In a follow-up study [20], different types of SATD were identified, namely, defect debt, design debt, documentation debt, requirement debt and test debt. They found out, that the most common type of SATD comments are design debt (ranging between 42%84%) and requirement debt (ranging between 5% and 45%)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call