Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review

Ahmed Al-Shaaby,Hamoud Aljamaan,Mohammad Alshayeb

doi:10.1007/s13369-019-04311-w

Abstract

Code smells are indicators of potential problems in software. They tend to have a negative impact on software quality. Several studies use machine learning techniques to detect bad smells. The objective of this study is to systematically review and analyze machine learning techniques used to detect code smells to provide interested research community with knowledge about the adopted techniques and practices for code smells detection. We use a systematic literature review approach to review studies that use machine learning techniques to detect code smells. Seventeen primary studies were identified. We found that 27 code smells were used in the identified studies; God Class and Long Method, Feature Envy, and Data Class are the most frequently detected code smells. In addition, we found that 16 machine learning algorithms were employed to detect code smells with acceptable prediction accuracy. Furthermore, we the results also indicate that support vector machine techniques were investigated the most. Moreover, we observed that J48 and Random Forest algorithms outperform the other algorithms. We also noticed that, in some cases, the use of boosting techniques on the models does not always enhance their performance. More studies are needed to consider the use of ensemble learning techniques, multiclassification, and feature selection technique for code smells detection. Thus, the application of machine learning algorithms to detect code smells in systems is still in its infancy and needs more research to facilitate the employment of machine learning algorithms in detecting code smells.

Full Text