Application of LSSVM and SMOTE on Seven Open Source Projects for Predicting Refactoring at Class Level

Lov Kumar,Ashish Sureka

doi:10.1109/apsec.2017.15

Abstract

Source code refactoring consisting of modifying the structure of the source code without changing its functionality and external behavior. We present a method to predict refactoring candidates at class level which can help developers in improving their design and structure of source code while preserving the behavior. We propose a technique to predict refactoring candidates based on the application of a machine learning based framework. We use Least Squares Support Vector Machines (LS-SVM) as the learning algorithm, Principal Component Analysis (PCA) as a feature extraction technique and Synthetic Minority Over-sampling Technique (SMOTE) as a technique for handling imbalanced data. We start with 102 source code metrics as input features which are then reduced to 31 features after removing irrelevant and redundant features through statistical tests. We conduct a series of experiments on publicly available software engineering dataset consisting of seven open-source software systems in which the refactored classes are manually validated. We apply LS-SVM with three different functions: linear, polynomial and Radial Basis Function (RBF). Statistical significance test demonstrate that RBF kernel outperforms linear and polynomial kernel but there is no statistically significant difference between the performance of linear and polynomial kernel. Statistical significance test reveals that with-SMOTE technique outperforms without-SMOTE and all metrics outperforms PCA based metrics. The mean value of Area Under Curve (AUC) for LS-SVM RBF kernel is 0.96.

Full Text