Abstract Purpose of the study: Cerebrovascular accident (CVA) occurs swiftly
and disrupts brain blood flow, causing neurological issues. Risk factors include
smoking, diabetes, hyperlipidemia, hypertension, and atrial fibrillation. Many brain
stroke prediction systems are computationally demanding, sluggish, and unreliable.
This study uses interpolation spline curves impute data and Principal Component
Analysis to identify the most important factors for stroke prediction. By using stack
ensemble machine learning methods, the goal is to improve the precision of stroke
disease identification.
Used method: Using interpolation spline curve data imputation and Principal
Component Analysis to preprocess and extract EHR data is unique. In addition, this
article also proposes a novel stacking ensemble technique using bagging and boosting
as foundation learners. Bagging and boosting enhance prediction by reducing variation
and bias. Stack-based ensemble learning classifies brain strokes. Build base and meta-
learners with Random Forest, XGBoost, Decision tree classifier, KNN, and Logistic
Regression. Evaluation of machine learning model efficiency determines the optimal
stacking ensemble meta-learner model.
Brief Description of Results: The results emphasize the stacking ensemble’s best
meta-learner model and imputation technique’s impact on electronic health record
representation. By using PCA, most important factors has identified for detecting
stroke. The analysis results showed that the Logistic Regression as a meta learner had
the best predictive performance which achieves 0.96 precision and 0.98 recall value
respectively with 97% accuracy and AUC curve value of 0.99.
Findings: The Decision Tree Classifier and K-Nearest Neighbour as meta learner got
95%, 96%accuracy, and both 0.95 AOC curve values on this dataset, showing that the
method works. Precision values of 0.93, and 0.95 for the Decision Tree Classifier and
KNN Classifiers as meta-learner confirm their robustness. These findings suggest that
interpolation spline curve and stacking based ensemble machine learning can improve
the identification of brain stroke from incomplete electronic health records.
Read full abstract