Weighted knn using grey relational analysis for cross-project defect prediction

Di Ulumi,D Siahaan

doi:10.1088/1742-6596/1230/1/012062

Abstract

Defect prediction plays important roles in detecting vulnerable component within a software. Some researchers have tried to improve the accuracy of software defect prediction so that it helps developer to manage resources (human, cost, and time) better. They focus on building the software defect prediction model only for a specific domain. To our knowledge, research on cross-project domains has not been carried out before. This research developed method to predict software defect for cross-project domains. Thus, the domain contains datasets with different number of features. To extend shorted features in a dataset, the method calculates the missing values. This research employed weighted KNN to fill in the missing value. The refilled datasets were then classified using naive bayes and random forest. This research also conducted a feature selection process to select relevant features for detecting defects by means of a comparative analysis of methods of selection of features. For the experimentation, this research used seven NASA public dataset MDPs. The results show that for imbalance data, naïve bayes combined with information gain (IG) or symmetric uncertainty (SU) feature selection produced the best balance, i.e. 0.4975. It also shows that for balance data, random forest combined with gain ratio (GR) produced the best balance, i.e. 0.7795. In general, the developed method performed relatively alike the previous method, which classify only specific domain, i.e. 0.4975. It even outperformed previous method for dataset PC2, i.e. 0.4033.

Full Text