Abstract
Analysis without adequate handling of missing values may lead to inconsistent and biased estimates. Despite multiple imputations becoming a widely used approach in handling missing data, manuscript researchers generally encounter missing data in their respective studies. In high-dimensional data, penalized regression is a popular technique for performing feature selection and coefficient estimation simultaneously. However, one of the most vital issues with high-dimensional data is that it often contains large quantities of missing data that common multiple imputation approaches may not work correctly. Therefore, this study uses imputations penalized regression models as an extension of the penalized methods to improve the performance and impute missing values in high-dimensional data. The method was applied to real-life high-dimensional datasets for the different number of features, sample sizes, and missing dataset rates to evaluate its efficiency. The method was also compared with other existing imputation penalized methods for high-dimensional data. The comparative experimental results indicate that the proposed method outperforms its competitors by achieving higher sensitivity, specificity, and classification accuracy values.
Highlights
Missing data exist in almost all areas of biomedical, epidemiological, and social research
There has been significant progress in the methods and tools for variable selection, missing data often occurs in extensive, complicated research and which can make data analysis challenging
It is mainly focused on improving the performance of penalized logistic regression models and handling missing values in high-dimensional data through the imputations adaptive penalized logistic regression (IAPLR) method
Summary
Missing data exist in almost all areas of biomedical, epidemiological, and social research. Many statistical techniques often require complete cases without any missing data. This as inaccurate estimates and conclusions may result from an analysis that does not properly handle missing values [2]. Delete a high number of observations with missing values, on the other hand, results in a considerable loss of data [3], [4]. It has a negative impact on the data's statistical power and efficiency [5]. To overcome the missing values in high-dimensional data, reliable imputation approaches are required
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Online and Biomedical Engineering (iJOE)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.