Availability of Evidence for Predictive Machine Learning Algorithms in Primary Care
The aging and multimorbid population and health personnel shortages pose a substantial burden on primary health care. While predictive machine learning (ML) algorithms have the potential to address these challenges, concerns include transparency and insufficient reporting of model validation and effectiveness of the implementation in the clinical workflow. To systematically identify predictive ML algorithms implemented in primary care from peer-reviewed literature and US Food and Drug Administration (FDA) and Conformité Européene (CE) registration databases and to ascertain the public availability of evidence, including peer-reviewed literature, gray literature, and technical reports across the artificial intelligence (AI) life cycle. PubMed, Embase, Web of Science, Cochrane Library, Emcare, Academic Search Premier, IEEE Xplore, ACM Digital Library, MathSciNet, AAAI.org (Association for the Advancement of Artificial Intelligence), arXiv, Epistemonikos, PsycINFO, and Google Scholar were searched for studies published between January 2000 and July 2023, with search terms that were related to AI, primary care, and implementation. The search extended to CE-marked or FDA-approved predictive ML algorithms obtained from relevant registration databases. Three reviewers gathered subsequent evidence involving strategies such as product searches, exploration of references, manufacturer website visits, and direct inquiries to authors and product owners. The extent to which the evidence for each predictive ML algorithm aligned with the Dutch AI predictive algorithm (AIPA) guideline requirements was assessed per AI life cycle phase, producing evidence availability scores. The systematic search identified 43 predictive ML algorithms, of which 25 were commercially available and CE-marked or FDA-approved. The predictive ML algorithms spanned multiple clinical domains, but most (27 [63%]) focused on cardiovascular diseases and diabetes. Most (35 [81%]) were published within the past 5 years. The availability of evidence varied across different phases of the predictive ML algorithm life cycle, with evidence being reported the least for phase 1 (preparation) and phase 5 (impact assessment) (19% and 30%, respectively). Twelve (28%) predictive ML algorithms achieved approximately half of their maximum individual evidence availability score. Overall, predictive ML algorithms from peer-reviewed literature showed higher evidence availability compared with those from FDA-approved or CE-marked databases (45% vs 29%). The findings indicate an urgent need to improve the availability of evidence regarding the predictive ML algorithms' quality criteria. Adopting the Dutch AIPA guideline could facilitate transparent and consistent reporting of the quality criteria that could foster trust among end users and facilitating large-scale implementation.
- Front Matter
6
- 10.1016/j.spinee.2021.06.012
- Jun 17, 2021
- The Spine Journal
Artificial intelligence and spine: rise of the machines
- Supplementary Content
20
- 10.1016/j.amsu.2022.104956
- Nov 23, 2022
- Annals of Medicine and Surgery
BackgroundMedical researchers and clinicians have shown much interest in developing machine learning (ML) algorithms to detect/predict surgical site infections (SSIs). However, little is known about the overall performance of ML algorithms in predicting SSIs and how to improve the algorithm's robustness. We conducted a systematic review and meta-analysis to summarize the performance of ML algorithms in SSIs case detection and prediction and to describe the impact of using unstructured and textual data in the development of ML algorithms.MethodsMEDLINE, EMBASE, CINAHL, CENTRAL and Web of Science were searched from inception to March 25, 2021. Study characteristics and algorithm development information were extracted. Performance statistics (e.g., sensitivity, area under the receiver operating characteristic curve [AUC]) were pooled using a random effect model. Stratified analysis was applied to different study characteristic levels. Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Diagnostic Test Accuracy Studies (PRISMA-DTA) was followed.ResultsOf 945 articles identified, 108 algorithms from 32 articles were included in this review. The overall pooled estimate of the SSI incidence rate was 3.67%, 95% CI: 3.58–3.76. Mixed-use of structured and textual data-based algorithms (pooled estimates of sensitivity 0.83, 95% CI: 0.78–0.87, specificity 0.92, 95% CI: 0.86–0.95, AUC 0.92, 95% CI: 0.89–0.94) outperformed algorithms solely based on structured data (sensitivity 0.56, 95% CI:0.43–0.69, specificity 0.95, 95% CI:0.91–0.97, AUC = 0.90, 95% CI: 0.87–0.92).ConclusionsML algorithms developed with structured and textual data provided optimal performance. External validation of ML algorithms is needed to translate current knowledge into clinical practice.
- Research Article
23
- 10.3390/s23073622
- Mar 30, 2023
- Sensors (Basel, Switzerland)
Machine learning (ML) has transformed neuroimaging research by enabling accurate predictions and feature extraction from large datasets. In this study, we investigate the application of six ML algorithms (Lasso, relevance vector regression, support vector regression, extreme gradient boosting, category boost, and multilayer perceptron) to predict brain age for middle-aged and older adults, which is a crucial area of research in neuroimaging. Despite the plethora of proposed ML models, there is no clear consensus on how to achieve better performance in brain age prediction for this population. Our study stands out by evaluating the impact of both ML algorithms and image modalities on brain age prediction performance using a large cohort of cognitively normal adults aged 44.6 to 82.3 years old (N = 27,842) with six image modalities. We found that the predictive performance of brain age is more reliant on the image modalities used than the ML algorithms employed. Specifically, our study highlights the superior performance of T1-weighted MRI and diffusion-weighted imaging and demonstrates that multi-modality-based brain age prediction significantly enhances performance compared to unimodality. Moreover, we identified Lasso as the most accurate ML algorithm for predicting brain age, achieving the lowest mean absolute error in both single-modality and multi-modality predictions. Additionally, Lasso also ranked highest in a comprehensive evaluation of the relationship between BrainAGE and the five frequently mentioned BrainAGE-related factors. Notably, our study also shows that ensemble learning outperforms Lasso when computational efficiency is not a concern. Overall, our study provides valuable insights into the development of accurate and reliable brain age prediction models for middle-aged and older adults, with significant implications for clinical practice and neuroimaging research. Our findings highlight the importance of image modality selection and emphasize Lasso as a promising ML algorithm for brain age prediction.
- Research Article
- 10.4108/eetpht.10.5514
- Mar 22, 2024
- EAI Endorsed Transactions on Pervasive Health and Technology
INTRODUCTION: This study compares and contrasts various machine learning algorithms for predicting diabetes. The study of current research work is to analyse the effectiveness of various machine learning algorithms for diabetes prediction.
 OBJECTIVES: To compare the efficacy of various machine learning algorithms for diabetic prediction.
 METHODS: For the same, a diabetic dataset was subjected to the application of various well-known machine learning algorithms. Unbalanced data was handled by pre-processing the dataset. The models were subsequently trained and assessed using different performance metrics namely F1-score, accuracy, sensitivity, and specificity.
 RESULTS: The experimental results show that the Decision Tree and ensemble model outperforms all other comparative models in terms of accuracy and other evaluation metrics.
 CONCLUSION: This study can help healthcare practitioners and researchers to choose the best machine learning model for diabetes prediction based on their specific needs and available data.
- Research Article
- 10.54691/h4fw9582
- Dec 24, 2025
- Frontiers in Science and Engineering
To investigate and compare the performance of different machine learning algorithms in forecasting, this paper presents a case study on predicting industrial land use. We conduct a benchmarking study employing four representative algorithms: Linear Regression, Random Forest, Naïve Bayes, and Artificial Neural Networks (ANN). The historical data of Shandong Province from 2001 to 2022 was established as the dataset for this study, in which GDP, FAI(Fixed Asset Investment), and IOVAS(Industrial Output Value Above Scale) were set as independent variables, while ILV(Industrial Land Volume)was used as the dependent variable, and four machine learning algorithms, utilized the dataset to complete the training and testing of the models, and finally, this study provides the accuracy of these four machine learning algorithms in industrial land use prediction.
- Abstract
1
- 10.1177/2473011421s00122
- Jan 1, 2022
- Foot & Ankle Orthopaedics
Category:Ankle Arthritis; AnkleIntroduction/Purpose:Ankle arthrodesis and total ankle replacement are the most commonly performed procedures for surgical management of ankle arthritis. Arthrodesis provides effective pain relief but the rate of complications after arthrodesis is higher as it is more commonly performed in patients with comorbidities that preclude ankle replacement. Accurately risk- stratifying patients who undergo ankle arthrodesis would be of great utility, given the significant cost and morbidity associated with developing major perioperative complications. There is a paucity of accurate prediction models that can be used to pre- operatively risk-stratify patients for ankle arthrodesis. We aim to develop a machine learning (ML) algorithm for prediction of major perioperative complication after ankle arthrodesis as well as compare its performance against traditional predictive models based on logistic regression.Methods:This is a retrospective cohort study of adult patients who underwent ankle arthrodesis at any non-federal California hospital between 2015 and 2017. The primary outcome was readmission within 30 days or major perioperative complication - venous thromboembolism within 30 days, myocardial infarction within 7 days, pneumonia within 7 days, systemic infection within 7 days, surgical site bleeding within 90 days, and wound complications within 90 days. We build ML and logistic regression models that span different classes of modeling approaches: XGBoost, AdaBoost, Gradient Boosting, and Random Forest. Discrimination and calibration were assessed using area under the receiver operating characteristic curve (AUROC) and Brier score, respectively. We utilize a partial dependence function to measure the importance of an individual feature by assessing the average effect in predicted risks when its value is altered. We rank the contribution of the included variables to the prediction of adverse outcomes.Results:A total of 1,084 patients met inclusion criteria for this study. There were 131 major complications or readmission (12.1%). The optimized XGBoost algorithm demonstrates higher discrimination (AUROC: 0.707 + 0.052) compared to LR (0.691 + 0.055). The receiver operating characteristic curves for the XGBoost and logistic regression models are visualized in Figure 1. XGBoost also outperforms the three other ML models. This model was well calibrated (Brier score: 0.103 + 0.001). The variables most important for the XGBoost model include diabetes, chronic kidney disease, implant complication, and major fracture. Five of the ten most important features for XGBoost were markedly less important for the traditional logistic regression model: male sex, prior hip fracture, cardiorespiratory failure, acute renal failure, and dialysis status.Conclusion:We report a ML algorithm for prediction of major perioperative complications after ankle arthrodesis. The optimized XGBoost model is well-calibrated and demonstrates superior risk prediction to logistic regression. This tool may identify and address potentially modifiable risk factors, helping to accurately risk-stratify patients and decrease likelihood of major complications. Notably, the predictors most important for XGBoost are different from those for logistic regression. This suggests that the superior discriminative capability of ML methods stems from their ability to capture complex non-linear relationships between variables that logistic regression is unable to detect.
- Research Article
4
- 10.1016/j.eswa.2023.122982
- Dec 23, 2023
- Expert Systems with Applications
Systematic review and network meta-analysis of machine learning algorithms in sepsis prediction
- Research Article
1
- 10.36548/jaicn.2022.4.007
- Jan 18, 2023
- Journal of Artificial Intelligence and Capsule Networks
Parkinson Disorder (PD) is a neurological disorder which is progressive in nature and has no cure. Early diagnosis of PD plays a key role in delaying the progression of the disorder. Dysphonia is the most prominent early symptom which is exhibited by approximately 90% of PD patients. Voice features based early diagnosis with the integration of Artificial Intelligence plays a prominent role in providing accurate, non-invasive, and robust predictions to PD patients. This paper focuses on providing comparative and experimental analysis of Machine Learning (ML) algorithms for the prediction of PD based on the voice features dataset retrieved from the UCI repository. This paper presents the results from the four sampling experiments conducted with different traditional ML algorithms for the retrieved voice dataset. The results of this study make it evident that Naïve Bayes provides a highest accuracy of 89% when compared to other ML algorithms. This study helps in identifying the best ML algorithm among the traditional ML algorithms for PD prediction based on voice features dataset.
- Research Article
9
- 10.3390/ma16196606
- Oct 9, 2023
- Materials
Fatigue life prediction of Inconel 718 fabricated by laser powder bed fusion was investigated using a miniature specimen tests method and machine learning algorithms. A small dataset-based machine learning framework integrating thirteen kinds of algorithms was constructed to predict the pore-influenced fatigue life. The method of selecting random seeds was employed to evaluate the performance of the algorithms, and then the ranking of various machine learning algorithms for predicting pore-influenced fatigue life on small datasets was obtained by verifying the prediction model twenty or thirty times. The results showed that among the thirteen popular machine learning algorithms investigated, the adaptive boosting algorithm from the boosting category exhibited the best fitting accuracy for fatigue life prediction of the additively manufactured Inconel 718 using the small dataset, followed by the decision tree algorithm in the nonlinear category. The investigation also found that DT, RF, GBDT, and XGBOOST algorithms could effectively predict the fatigue life of the additively manufactured Inconel 718 within the range of 1 × 105 cycles on a small dataset compared to others. These results not only demonstrate the capability of using small dataset-based machine learning techniques to predict fatigue life but also may guide the selection of algorithms that minimize performance evaluation costs when predicting fatigue life.
- Research Article
28
- 10.1186/s12885-023-10808-3
- Apr 13, 2023
- BMC Cancer
BackgroundCervical cancer is a common malignant tumor of the female reproductive system and is considered a leading cause of mortality in women worldwide. The analysis of time to event, which is crucial for any clinical research, can be well done with the method of survival prediction. This study aims to systematically investigate the use of machine learning to predict survival in patients with cervical cancer.MethodAn electronic search of the PubMed, Scopus, and Web of Science databases was performed on October 1, 2022. All articles extracted from the databases were collected in an Excel file and duplicate articles were removed. The articles were screened twice based on the title and the abstract and checked again with the inclusion and exclusion criteria. The main inclusion criterion was machine learning algorithms for predicting cervical cancer survival. The information extracted from the articles included authors, publication year, dataset details, survival type, evaluation criteria, machine learning models, and the algorithm execution method.ResultsA total of 13 articles were included in this study, most of which were published from 2018 onwards. The most common machine learning models were random forest (6 articles, 46%), logistic regression (4 articles, 30%), support vector machines (3 articles, 23%), ensemble and hybrid learning (3 articles, 23%), and Deep Learning (3 articles, 23%). The number of sample datasets in the study varied between 85 and 14946 patients, and the models were internally validated except for two articles. The area under the curve (AUC) range for overall survival (0.40 to 0.99), disease-free survival (0.56 to 0.88), and progression-free survival (0.67 to 0.81), respectively from (lowest to highest) received. Finally, 15 variables with an effective role in predicting cervical cancer survival were identified.ConclusionCombining heterogeneous multidimensional data with machine learning techniques can play a very influential role in predicting cervical cancer survival. Despite the benefits of machine learning, the problem of interpretability, explainability, and imbalanced datasets is still one of the biggest challenges. Providing machine learning algorithms for survival prediction as a standard requires further studies.
- Research Article
53
- 10.1111/jop.13135
- Dec 15, 2020
- Journal of oral pathology & medicine : official publication of the International Association of Oral Pathologists and the American Academy of Oral Pathology
Machine learning analyses of cancer outcomes for oral cancer remain sparse compared to other types of cancer like breast or lung. The purpose of the present study was to compare the performance of machine learning algorithms in the prediction of global, recurrence-free five-year survival in oral cancer patients based on clinical and histopathological data. Data were gathered retrospectively from 416 patients with oral squamous cell carcinoma. The data set was divided into training and test data set (75:25 split). Training performance of five machine learning algorithms (Logistic regression, K-nearest neighbours, Naïve Bayes, Decision tree and Random forest classifiers) for prediction was assessed by k-fold cross-validation. Variables used in the machine learning models were age, sex, pain symptoms, grade of lesion, lymphovascular invasion, extracapsular extension, perineural invasion, bone invasion and type of treatment. Variable importance was assessed and model performance on the testing data was assessed using receiver operating characteristic curves, accuracy, sensitivity, specificity and F1 score. The best performing model was the Decision tree classifier, followed by the Logistic Regression model (accuracy 76% and 60%, respectively). The Naïve Bayes model did not display any predictive value with 0% specificity. Machine learning presents a promising and accessible toolset for improving prediction of oral cancer outcomes. Our findings add to a growing body of evidence that Decision tree models are useful in models in predicting OSCC outcomes. We would advise that future similar studies explore a variety of machine learning models including Logistic regression to help evaluate model performance.
- Research Article
33
- 10.3390/su14052546
- Feb 22, 2022
- Sustainability
The use of machine learning (ML) algorithms for power demand and supply prediction is becoming increasingly popular in smart grid systems. Due to the fact that there exist many simple ML algorithms/models in the literature, the question arises as to whether there is any significant advantage(s) among these different ML algorithms, particularly as it pertains to power demand/supply prediction use cases. Toward answering this question, we examined six well-known ML algorithms for power prediction in smart grid systems, including the artificial neural network, Gaussian regression (GR), k-nearest neighbor, linear regression, random forest, and support vector machine (SVM). First, fairness was ensured by undertaking a thorough hyperparameter tuning exercise of the models under consideration. As a second step, power demand and supply statistics from the Eskom database were selected for day-ahead forecasting purposes. These datasets were based on system hourly demand as well as renewable generation sources. Hence, when their hyperparameters were properly tuned, the results obtained within the boundaries of the datasets utilized showed that there was little/no significant difference in the quantitative and qualitative performance of the different ML algorithms. As compared to photovoltaic (PV) power generation, we observed that these algorithms performed poorly in predicting wind power output. This could be related to the unpredictable wind-generated power obtained within the time range of the datasets employed. Furthermore, while the SVM algorithm achieved the slightly quickest empirical processing time, statistical tests revealed that there was no significant difference in the timing performance of the various algorithms, except for the GR algorithm. As a result, our preliminary findings suggest that using a variety of existing ML algorithms for power demand/supply prediction may not always yield statistically significant comparative prediction results, particularly for sources with regular patterns, such as solar PV or daily consumption rates, provided that the hyperparameters of such algorithms are properly fine tuned.
- Research Article
10
- 10.3389/fnut.2022.740898
- Feb 17, 2022
- Frontiers in Nutrition
Machine learning (ML) algorithms may help better understand the complex interactions among factors that influence dietary choices and behaviors. The aim of this study was to explore whether ML algorithms are more accurate than traditional statistical models in predicting vegetable and fruit (VF) consumption. A large array of features (2,452 features from 525 variables) encompassing individual and environmental information related to dietary habits and food choices in a sample of 1,147 French-speaking adult men and women was used for the purpose of this study. Adequate VF consumption, which was defined as 5 servings/d or more, was measured by averaging data from three web-based 24 h recalls and used as the outcome to predict. Nine classification ML algorithms were compared to two traditional statistical predictive models, logistic regression and penalized regression (Lasso). The performance of the predictive ML algorithms was tested after the implementation of adjustments, including normalizing the data, as well as in a series of sensitivity analyses such as using VF consumption obtained from a web-based food frequency questionnaire (wFFQ) and applying a feature selection algorithm in an attempt to reduce overfitting. Logistic regression and Lasso predicted adequate VF consumption with an accuracy of 0.64 (95% confidence interval [CI]: 0.58–0.70) and 0.64 (95%CI: 0.60–0.68) respectively. Among the ML algorithms tested, the most accurate algorithms to predict adequate VF consumption were the support vector machine (SVM) with either a radial basis kernel or a sigmoid kernel, both with an accuracy of 0.65 (95%CI: 0.59–0.71). The least accurate ML algorithm was the SVM with a linear kernel with an accuracy of 0.55 (95%CI: 0.49–0.61). Using dietary intake data from the wFFQ and applying a feature selection algorithm had little to no impact on the performance of the algorithms. In summary, ML algorithms and traditional statistical models predicted adequate VF consumption with similar accuracies among adults. These results suggest that additional research is needed to explore further the true potential of ML in predicting dietary behaviours that are determined by complex interactions among several individual, social and environmental factors.
- Research Article
3
- 10.30574/wjarr.2024.23.3.2928
- Sep 30, 2024
- World Journal of Advanced Research and Reviews
This study aims to identify the most accurate machine learning algorithm for predicting heart attacks using demographic data, physiological measurements, and electrocardiogram (ECG) results. We utilized a dataset of 4,000 patient records, combining data from DMCH and Kaggle. Our methodology involved comprehensive data preprocessing, including ECG noise removal and feature selection using the Brouta algorithm. We implemented and compared six machine learning algorithms: Decision Tree, Random Forest, Logistic Regression, Support Vector Machine, XGBoost, and K-Nearest Neighbors. The results demonstrate that our proposed method can accurately predict heart attacks with high sensitivity and specificity. Among the tested algorithms, Random Forest achieved the highest accuracy of 87%, with well-balanced precision (0.86), recall (0.85), and F1-score (0.87). K-Nearest Neighbors and XGBoost also showed strong performance, with accuracies of 81% and 80% respectively. This study contributes to the field by utilizing a large, diverse dataset and providing a comprehensive comparison of multiple algorithms. Our findings suggest the potential for integrating machine learning, particularly Random Forest models, into clinical practice for early heart attack risk assessment, representing a significant step towards improving cardiovascular care through advanced data analysis techniques.
- Research Article
56
- 10.1016/j.spinee.2020.04.001
- Apr 12, 2020
- The Spine Journal
Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.