Python Code Smell Detection Using Machine Learning

Natthida Vatanapakorn,Pusadee Seresangtakul,Chitsutha Soomlek

doi:10.1109/icsec56337.2022.10049330

Abstract

Python is an increasingly popular programming language used in various software projects and domains. Code smells in Python significantly influences the maintainability, understandability, testability issues. This paper proposes a machine learning-based code smell detection for Python programs. We trained eight machine learning models with a dataset based on 115 open-source Python projects, 39 class-level software metrics, and 22 function-level software metrics. We intended to identify five code smell types in both class and function levels, i.e., long method, long parameter list, large class long scope chaining, and long based class list. Correlation-based feature selection (CFS) and logistic regression-forward stepwise (conditional) selection were employed to improve the performance of the model. This research concluded with an empirical evaluation of the performance of the machine learning approaches against the tuning machine method. The results show that the machine learning method achieved 99.72% accuracy when identifying long method and long base class list. The machine learning-based code smell detection also outperformed the tuning machine method. Moreover, we also found a set of high-impact features that contributed most when identifying each type of code smell.

Full Text