CBRL and CBRC: Novel Algorithms for Improving Missing Value Imputation Accuracy Based on Bayesian Ridge Regression

Samih M. Mostafa,Abdelrahman S. Eladimy,Safwat Hamad,Hirofumi Amano

doi:10.3390/sym12101594

Samih M. Mostafa, Abdelrahman S. Eladimy + Show 2 more

Open Access

https://doi.org/10.3390/sym12101594

Copy DOI

Abstract

In most scientific studies such as data analysis, the existence of missing data is a critical problem, and selecting the appropriate approach to deal with missing data is a challenge. In this paper, the authors perform a fair comparative study of some practical imputation methods used for handling missing values against two proposed imputation algorithms. The proposed algorithms depend on the Bayesian Ridge technique under two different feature selection conditions. The proposed algorithms differ from the existing approaches in that they cumulate the imputed features; those imputed features will be incorporated within the Bayesian Ridge equation for predicting the missing values in the next incomplete selected feature. The authors applied the proposed algorithms on eight datasets with different amount of missing values created from different missingness mechanisms. The performance was measured in terms of imputation time, root-mean-square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE). The results showed that the performance varies depending on missing values percentage, size of the dataset, and the missingness mechanism. In addition, the performance of the proposed methods is slightly better.

Highlights

Data that contains missing values have been considered as one of the main problems that prevent building an efficient model
Log scale is used in root-mean-square error (RMSE), mean absolute error (MAE), and imputation time comparisons because each of which has a different range of values
With regard to RMSE, MAE, and imputation time metrics, lower value is better, so they are gathered in the same figure

Summary

Introduction

Data that contains missing values have been considered as one of the main problems that prevent building an efficient model. The amount of missing data affects the model performance and produces biased estimates of predictions leading to unacceptable results [1]. The subsections discuss the types of missingness in data and the handling methods. Detecting the source of “missingness” is vital, as it affects the selection of the imputation method. Missing data occur in the medical field when: (i) the variable was measured, but for an unknown reason the values were not electronically written down, e.g., loss of sensors, errors in connecting with the database server, unintentional human forgetfulness, electricity decay, and others, (ii) the variable was unmeasured all over a quantity of time because of a Symmetry 2020, 12, 1594; doi:10.3390/sym12101594 www.mdpi.com/journal/symmetry

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Sep 25, 2020
Citations: 17	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CBRL and CBRC: Novel Algorithms for Improving Missing Value Imputation Accuracy Based on Bayesian Ridge Regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures
Faten Khalid Karim ... Safwat Hamad
Electronics | VOL. 11
Faten Khalid Karim, et. al.Faten Khalid Karim ... Safwat Hamad
28 Nov 2022
Electronics | VOL. 11

CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain Ratio
Samih M Mostafa ... Abdelrahman S Eladimy
IEEE access : practical innovations, open solutions | VOL. 8
Samih M Mostafa, et. al.Samih M Mostafa ... Abdelrahman S Eladimy
01 Jan 2020
IEEE access : practical innovations, open solutions | VOL. 8

Towards Improving Predictive Statistical Learning Model Accuracy by Enhancing Learning Technique
Ali Algarni ... Wardah Alamri
Computer Systems Science and Engineering | VOL. 42
Ali Algarni, et. al.Ali Algarni ... Wardah Alamri
01 Jan 2021
Computer Systems Science and Engineering | VOL. 42

اصلاح ضریب کریگر براساس دوره‌های بازگشت مختلف به منظور برآورد دبی حداکثر سیل (مطالعه موردی: حوضه آبریز ایران مرکزی)
...
-
, et. al. ...
22 Nov 2013
22 Nov 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CBRL and CBRC: Novel Algorithms for Improving Missing Value Imputation Accuracy Based on Bayesian Ridge Regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry