Missing Data Imputation Techniques Research Articles

ABSTRACT Background Identifying the most important predictors of substance use is crucial for developing effective prevention policies. Traditional statistical methods have some limitations in this regard. To address these limitations, the researchers utilized artificial intelligence (AI) methods to identify the top 10 most important predictors of cannabis use in Finland. Objective The objective of this study was to apply AI techniques to identify the key predictors of cannabis use in Finland. Specifically, the researchers aimed to determine the top 10 most important features related to cannabis use from a dataset consisting of 3229 observations and 313 questionnaire items, with 48 selected for preprocessing. Methods The researchers employed the recursive feature elimination (RFE) method as part of their AI analysis. This technique was used on 60 processed variables, following the application of missing data imputation, resampling, and scaling techniques. The RFE method allowed the researchers to narrow down the 60 variables to the top 10 most important features associated with cannabis use. Results The AI models developed using the selected features were able to predict cannabis use with a remarkable accuracy of 96% for the previous 12 months. The results of the study revealed that the social settings of individuals played the most significant role in predicting cannabis use in the context of Finland. Conclusions In conclusion, this study demonstrated the effectiveness of AI-based approaches in identifying the most critical predictors of cannabis use in Finland. The research highlighted that social settings had the highest impact on cannabis use in this setting. Moreover, the study showcased the potential of AI methods not only for identifying key risk indicators among various factors but also for optimizing the utilization of limited public resources when devising prevention strategies. These findings can be valuable for shaping targeted and efficient prevention policies to address cannabis use in Finland.

Read full abstract

BackgroundThere are often many missing values in medical data, which directly affect the accuracy of clinical decision making. Discharge assessment is an important part of clinical decision making. Taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example, this study adopted the missing data processing evaluation criteria more suitable for clinical decision making, aiming at systematically exploring the performance and applicability of single machine learning algorithms and ensemble learning (EL) under different data missing scenarios, as well as whether they had more advantages than traditional methods, so as to provide basis and reference for the selection of suitable missing data processing method in practical clinical decision making.MethodsThe whole process consisted of four main steps: (1) Based on the original complete data set, missing data was generated by simulation under different missing scenarios (missing mechanisms, missing proportions and ratios of missing proportions of each group). (2) Machine learning and traditional methods (eight methods in total) were applied to impute missing values. (3) The performances of imputation techniques were evaluated and compared by estimating the sensitivity, AUC and Kappa values of prediction models. (4) Statistical tests were used to evaluate whether the observed performance differences were statistically significant.ResultsThe performances of missing data processing methods were different to a certain extent in different missing scenarios. On the whole, machine learning had better imputation performance than traditional methods, especially in scenarios with high missing proportions. Compared with single machine learning algorithms, the performance of EL was more prominent, followed by neural networks. Meanwhile, EL was most suitable for missing imputation under MAR (the ratio of missing proportion 2:1) mechanism, and its average sensitivity, AUC and Kappa values reached 0.908, 0.924 and 0.596 respectively.ConclusionsIn clinical decision making, the characteristics of missing data should be actively explored before formulating missing data processing strategies. The outstanding imputation performance of machine learning methods, especially EL, shed light on the development of missing data processing technology, and provided methodological support for clinical decision making in presence of incomplete data.

Read full abstract

Missing Data Imputation Techniques Research Articles

Related Topics

Articles published on Missing Data Imputation Techniques

Predicting Clinical Outcomes in COVID-19 and Pneumonia Patients: A Machine Learning Approach.

Protocol for the development and validation of a machine-learning based tool for predicting the risk of hypertriglyceridemia in critically-ill patients receiving propofol sedation.

Performance of Different Imputation Methods in Logistic Regression with Multicollinearity

Diabetes Prediction Using Random Forest in Healthcare

A comparative analysis of missing data imputation techniques on sedimentation data

Data Imputation of Soil Pressure on Shield Tunnel Lining Based on Random Forest Model.

The impact of data imputation on air quality prediction problem.

Risk prediction model for cannabis use with artificial intelligence approach

Experience: Differentiating Between Isolated and Sequence Missing Data

Missing data imputation techniques for wireless continuous vital signs monitoring.

A Comparative Study of Missing Data Imputation Methods for Activity Recognition Task

Missing Data Imputation in the Internet of Things Sensor Networks

Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare

Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example

Missing data imputation on biomedical data using deeply learned clustering and L2 regularized regression based on symmetric uncertainty

EFFECTS OF DIFFERENT MULTIPLE IMPTUTATION TECHNIQUES ON THE MODEL FIT OF CONFIRMATORY FACTOR ANALYSIS

Forecasts of tropospheric ozone in the Metropolitan Area of Rio de Janeiro based on missing data imputation and multivariate calibration techniques.

Kernel weighted least square approach for imputing missing values of metabolomics data

Investigating the impact of missing data imputation techniques on battery energy management system

Fuzzy C-mean Missing Data Imputation for Analogy-based Effort Estimation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Missing Data Imputation Techniques Research Articles

Related Topics

Articles published on Missing Data Imputation Techniques

Predicting Clinical Outcomes in COVID-19 and Pneumonia Patients: A Machine Learning Approach.

Protocol for the development and validation of a machine-learning based tool for predicting the risk of hypertriglyceridemia in critically-ill patients receiving propofol sedation.

Performance of Different Imputation Methods in Logistic Regression with Multicollinearity

Diabetes Prediction Using Random Forest in Healthcare

A comparative analysis of missing data imputation techniques on sedimentation data

Data Imputation of Soil Pressure on Shield Tunnel Lining Based on Random Forest Model.

The impact of data imputation on air quality prediction problem.

Risk prediction model for cannabis use with artificial intelligence approach

Experience: Differentiating Between Isolated and Sequence Missing Data

Missing data imputation techniques for wireless continuous vital signs monitoring.

A Comparative Study of Missing Data Imputation Methods for Activity Recognition Task

Missing Data Imputation in the Internet of Things Sensor Networks

Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare

Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example

Missing data imputation on biomedical data using deeply learned clustering and L2 regularized regression based on symmetric uncertainty

EFFECTS OF DIFFERENT MULTIPLE IMPTUTATION TECHNIQUES ON THE MODEL FIT OF CONFIRMATORY FACTOR ANALYSIS

Forecasts of tropospheric ozone in the Metropolitan Area of Rio de Janeiro based on missing data imputation and multivariate calibration techniques.

Kernel weighted least square approach for imputing missing values of metabolomics data

Investigating the impact of missing data imputation techniques on battery energy management system

Fuzzy C-mean Missing Data Imputation for Analogy-based Effort Estimation