Abstract

This article primarily focuses on the performance evaluation of a new methodology, imputation by feature importance (IBFI), to serve its imputed dataset in further regression scenarios when dealing with soil radon gas concentration (SRGC) time-series data. The time-series data have been collected spanning over fourteen(14) months period, which included four seismic events, and have been used for experimentation. The imputation by feature importance (IBFI) has been experimented and obtained results are found more efficient in the imputation of missing patterns in investigated time series when compared to traditionally used imputation methods viz. mean, median, mode, predictive mean matching (PMM), and hot-deck imputation.The IBFI methodology has been used in a variety of settings, such as data missing not at random (MNAR), missing completely at random (MCAR), and missing at random (MAR), with missingness percentages ranging from 10% to 30%. In this study, the imputed datasets, 9 for each imputation method, have been used further to predict the attribute of interest (radon concentration (RN)) keeping others as independent attributes such as thoron, temperature, relative humidity, and pressure time series. Support vector machine (SVM) with linear kernel has been used as a learning algorithm and its performance was evaluated based on the fact that how efficient and unbiased values were imputed. Statistical performance evaluation measures viz. root mean squared log error (RMSLE), root mean square error (RMSE), mean squared error (MSE),and mean absolute percentage error (MAPE) have been calculated for the assessment of performance. The findings of our study show that the IBFI imputed dataset has provided a better-fitted model. The model generation and predictions upon IBFI imputed time series result in more accurate predictions when compared to mean, median, mode, PMM, and hot-deck imputed time series. Furthermore, PMM and median imputed time series also perform closer to the IBFI imputed time series.

Highlights

  • R ADON gas 222Rn poses health threats to human health and is an immediate decay product of radium 226Ra [1]

  • This study is the progressive stage of the previous work, imputation by feature importance (IBFI), which had been done for the reconstruction of missing patterns in soil radon gas concentration (SRGC) data and has been published elsewhere [43].As stated that the imputed values in a dataset play an important role in further analyses and experimentation

  • To analyze the performance of these imputation methods, this work has utilized the imputed datasets by IBFI and other imputation methods

Read more

Summary

Introduction

R ADON gas 222Rn poses health threats to human health and is an immediate decay product of radium 226Ra [1]. This study is the progressive stage of the previous work, imputation by feature importance (IBFI), which had been done for the reconstruction of missing patterns in soil radon gas concentration (SRGC) data and has been published elsewhere [43].As stated that the imputed values in a dataset play an important role in further analyses and experimentation. In this regard, the performance evaluation of imputation by feature importance (IBFI), to serve its imputed dataset for further regression scenarios is studied when predicting radon concentration from other meteorological attributes. To evaluate the prediction model’s performance, the mean absolute percentage error (MAPE), root mean square error (RMSE), mean squared error (MSE) and mean squared log error (RMSLE) are calculated

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call