Abstract

Code smells are poorly designed parts of code whose removal is essential for sustainable software development. However, recognizing code smells in practice is challenging. Machine Learning (ML)-based code smell detectors could solve this problem. Current ML-based code smell detection approaches are based on supervised learning (SL) that requires a large and diverse dataset for training. Unfortunately, the existing code smell datasets are small, which hinders the performance of the trained SL models. This paper aims to improve the performance of ML-based code smell detectors by employing semi-supervised learning (SSL). SSL models are trained by combining a manually labeled code smell dataset with unlabeled code snippets collected from open-source repositories. Two major SSL techniques are employed: self-training and co-training. Experiments were performed for two code smell types: God Class and Long Method. SSL classifiers significantly outperformed SL classifiers for God Class detection (by 6% F-measure). For Long Method detection, SSL classifiers slightly outperformed SL classifiers (by 1%F-measure). This paper is the first to consider applying SSL for code smell detection. SSL models outperforming SL models in all experiments suggest that SSL holds the great potential to improve current code smell detectors, which is essential for their adoption in practice.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call