Application of machine learning algorithms for code smell prediction using object-oriented software metrics

Mansi Agnihotri,Anuradha Chug

doi:10.1080/09720510.2020.1799576

Abstract

Code smells are generally not considered as bugs; instead, they point out certain shortcomings in the software design or code. Identification of code smell is a necessary step for improving the software quality and reducing the maintenance effort. In this study, we introduce a bad smell prediction technique based on object-oriented software metrics that use Decision Tree (DT) and Random Forest (RF) machine learning algorithm. An open-source project, namely JHOTDRAW, was used as our dataset, for which values of object-oriented software metrics were calculated. Two feature selection methods-Random Forest Importance (RFI) and Information Gain (IG) were applied to extract the most relevant attributes for the prediction of code smells, namely, Feature envy, Dispersed coupling, refused parent bequest, and God class. The random-search algorithm was used to tune the parameters of Random Forest and Decision Tree. Results show that the best classification accuracy for Decision Tree was obtained at 99.13% for refused parent bequest code smells. Results also show that after using the Random Forest classifier, refused parent bequest smell was predicted with the best accuracy of 99.14%. Finally, in this research study, a set of code smell prediction rules were extracted using Decision Tree.

Full Text