Weighted k Nearest Neighbor Using Grey Relational Analysis To Solve Missing Value

Desepta Isna Ulumi,Daniel Siahaan

doi:10.12962/j20882033.v29i3.5011

Abstract

Software defect prediction model is an important role in detecting the most vulnerable component error software. Some research have been worked to improve the accuracy of the prediction defects of the software in order to manage human, costs and time. But previous research used specific dataset for software defect prediction model. However, there is no a generic dataset handling for software defect prediction model yet. This research proposed improvements to the results of the software defect prediction on the merged dataset, which is called generic dataset, with a number of different features. In order to balance the number of features, each dataset should be filled with a missing value. To fill the missing values, Weighted k Nearest Neighbor (WkNN) method was used. Then, after missing values were filled, Naive Bayes was used to classify the selected features. This research needed to obtain a set of features which was relevant, then performed a feature selection method. The results showed that by using seven NASA public MDP datasets, Naive Bayes with Information Gain (IG) or Symmetric Uncertainty (SU) feature selection presented the best balance value. Software defect, NASA public MDP, weighted KNN, Naive B ayes

Full Text