Code smell detection based on supervised learning models: A survey

Yang Zhang,Chuyan Ge,Haiyang Liu,Kun Zheng

doi:10.1016/j.neucom.2023.127014

Abstract

Supervised learning-based code smell detection has become one of the dominant approaches to identify code smell. Existing works optimize the process of code smell detection from multiple aspects, such as high-quality dataset, feature selection, and model, etc. Although the accuracy is improved continuously, researchers are confused about what model are the most suitable ones to detect code smell when considering dataset construction and feature selection. Furthermore, existing surveys for code smell mainly analyze the impact of code smell, categorize the concerns of code smell, and repair code smell. There is a lack of systematic analysis and classification of code smell detection based on supervised learning. To this end, we collect 86 papers of code smell detection based on supervised learning ranging from January 2010 to April 2023. A total of 7 research questions is empirically evaluated from different aspects, such as datasets construction, data pre-processing, feature selection, and model training, etc. We conclude that existing works suffer from issues such as sample imbalance, different attention to types of code smell, and limited feature selection. Finally, we suggest possible future research directions.

Full Text