Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study

Jianglin Huang,W.K Chan,Yan-Fu Li,Federica Sarro,Jacky Wai Keung,Hongyi Sun,Y.T Yu

doi:10.1016/j.jss.2017.07.012

Abstract

Being able to predict software quality is essential, but also it pose significant challenges in software engineering. Historical software project datasets are often being utilized together with various machine learning algorithms for fault-proneness classification. Unfortunately, the missing values in datasets have negative impacts on the estimation accuracy and therefore, could lead to inconsistent results. As a method handling missing data, K nearest neighbor (KNN) imputation gradually gains acceptance in empirical studies by its exemplary performance and simplicity. To date, researchers still call for optimized parameter setting for KNN imputation to further improve its performance. In the work, we develop a novel incomplete-instance based KNN imputation technique, which utilizes a cross-validation scheme to optimize the parameters for each missing value. An experimental assessment is conducted on eight quality datasets under various missingness scenarios. The study also compared the proposed imputation approach with mean imputation and other three KNN imputation approaches. The results show that our proposed approach is superior to others in general. The relatively optimal fixed parameter settings for KNN imputation for software quality data is also determined. It is observed that the classification accuracy is improved or at least maintained by using our approach for missing data imputation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software

Lead the way for us

Journal: Journal of Systems and Software	Publication Date: Jul 13, 2017
Citations: 65

Similar Papers

Evolutionary k-nearest neighbor imputation algorithm for gene expression data
Hiroshi De Silva ... A Shehan Perera
International Journal on Advances in ICT for Emerging Regions (ICTer) | VOL. 10
Hiroshi De Silva, et. al.Hiroshi De Silva ... A Shehan Perera
04 May 2017
International Journal on Advances in ICT for Emerging Regions (ICTer) | VOL. 10

K-Means Clustering with KNN and Mean Imputation on CPU Benchmark Compilation Data
Irma Santikarama ... Puspita Nurul Sabrina
Journal of Applied Informatics and Computing | VOL. 7
Irma Santikarama, et. al.Irma Santikarama ... Puspita Nurul Sabrina
05 Dec 2023
Journal of Applied Informatics and Computing | VOL. 7

A Comparative Study on Imputation Techniques: Introducing a Transformer Model for Robust and Efficient Handling of Missing EEG Amplitude Data.
Murad Ali Khan
Bioengineering (Basel, Switzerland) | VOL. 11
Murad Ali KhanMurad Ali Khan
23 Jul 2024
Bioengineering (Basel, Switzerland) | VOL. 11

KNN imputation to missing values of regression-based rain duration prediction on BMKG data
Ikke Dian Oktaviani ... Aji Gautama Putrada
JURNAL INFOTEL | VOL. 14
Ikke Dian Oktaviani, et. al.Ikke Dian Oktaviani ... Aji Gautama Putrada
01 Nov 2022
JURNAL INFOTEL | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study

Abstract

Talk to us

Similar Papers

More From: Journal of Systems and Software