Machine Learning Models for Blood Glucose Level Prediction in Patients With Diabetes Mellitus: Systematic Review and Network Meta-Analysis.
Machine learning (ML) models provide more choices to patients with diabetes mellitus (DM) to more properly manage blood glucose (BG) levels. However, because of numerous types of ML algorithms, choosing an appropriate model is vitally important. In a systematic review and network meta-analysis, this study aimed to comprehensively assess the performance of ML models in predicting BG levels. In addition, we assessed ML models used to detect and predict adverse BG (hypoglycemia) events by calculating pooled estimates of sensitivity and specificity. PubMed, Embase, Web of Science, and Institute of Electrical and Electronics Engineers Explore databases were systematically searched for studies on predicting BG levels and predicting or detecting adverse BG events using ML models, from inception to November 2022. Studies that assessed the performance of different ML models in predicting or detecting BG levels or adverse BG events of patients with DM were included. Studies with no derivation or performance metrics of ML models were excluded. The Quality Assessment of Diagnostic Accuracy Studies tool was applied to assess the quality of included studies. Primary outcomes were the relative ranking of ML models for predicting BG levels in different prediction horizons (PHs) and pooled estimates of the sensitivity and specificity of ML models in detecting or predicting adverse BG events. In total, 46 eligible studies were included for meta-analysis. Regarding ML models for predicting BG levels, the means of the absolute root mean square error (RMSE) in a PH of 15, 30, 45, and 60 minutes were 18.88 (SD 19.71), 21.40 (SD 12.56), 21.27 (SD 5.17), and 30.01 (SD 7.23) mg/dL, respectively. The neural network model (NNM) showed the highest relative performance in different PHs. Furthermore, the pooled estimates of the positive likelihood ratio and the negative likelihood ratio of ML models were 8.3 (95% CI 5.7-12.0) and 0.31 (95% CI 0.22-0.44), respectively, for predicting hypoglycemia and 2.4 (95% CI 1.6-3.7) and 0.37 (95% CI 0.29-0.46), respectively, for detecting hypoglycemia. Statistically significant high heterogeneity was detected in all subgroups, with different sources of heterogeneity. For predicting precise BG levels, the RMSE increases with a rise in the PH, and the NNM shows the highest relative performance among all the ML models. Meanwhile, current ML models have sufficient ability to predict adverse BG events, while their ability to detect adverse BG events needs to be enhanced. PROSPERO CRD42022375250; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=375250.
- Research Article
36
- 10.1097/corr.0000000000001360
- Jul 30, 2020
- Clinical Orthopaedics & Related Research
Machine learning (ML) is a subdomain of artificial intelligence that enables computers to abstract patterns from data without explicit programming. A myriad of impactful ML applications already exists in orthopaedics ranging from predicting infections after surgery to diagnostic imaging. However, no systematic reviews that we know of have compared, in particular, the performance of ML models with that of clinicians in musculoskeletal imaging to provide an up-to-date summary regarding the extent of applying ML to imaging diagnoses. By doing so, this review delves into where current ML developments stand in aiding orthopaedists in assessing musculoskeletal images. This systematic review aimed (1) to compare performance of ML models versus clinicians in detecting, differentiating, or classifying orthopaedic abnormalities on imaging by (A) accuracy, sensitivity, and specificity, (B) input features (for example, plain radiographs, MRI scans, ultrasound), (C) clinician specialties, and (2) to compare the performance of clinician-aided versus unaided ML models. A systematic review was performed in PubMed, Embase, and the Cochrane Library for studies published up to October 1, 2019, using synonyms for machine learning and all potential orthopaedic specialties. We included all studies that compared ML models head-to-head against clinicians in the binary detection of abnormalities in musculoskeletal images. After screening 6531 studies, we ultimately included 12 studies. We conducted quality assessment using the Methodological Index for Non-randomized Studies (MINORS) checklist. All 12 studies were of comparable quality, and they all clearly included six of the eight critical appraisal items (study aim, input feature, ground truth, ML versus human comparison, performance metric, and ML model description). This justified summarizing the findings in a quantitative form by calculating the median absolute improvement of the ML models compared with clinicians for the following metrics of performance: accuracy, sensitivity, and specificity. ML models provided, in aggregate, only very slight improvements in diagnostic accuracy and sensitivity compared with clinicians working alone and were on par in specificity (3% (interquartile range [IQR] -2.0% to 7.5%), 0.06% (IQR -0.03 to 0.14), and 0.00 (IQR -0.048 to 0.048), respectively). Inputs used by the ML models were plain radiographs (n = 8), MRI scans (n = 3), and ultrasound examinations (n = 1). Overall, ML models outperformed clinicians more when interpreting plain radiographs than when interpreting MRIs (17 of 34 and 3 of 16 performance comparisons, respectively). Orthopaedists and radiologists performed similarly to ML models, while ML models mostly outperformed other clinicians (outperformance in 7 of 19, 7 of 23, and 6 of 10 performance comparisons, respectively). Two studies evaluated the performance of clinicians aided and unaided by ML models; both demonstrated considerable improvements in ML-aided clinician performance by reporting a 47% decrease of misinterpretation rate (95% confidence interval [CI] 37 to 54; p < 0.001) and a mean increase in specificity of 0.048 (95% CI 0.029 to 0.068; p < 0.001) in detecting abnormalities on musculoskeletal images. At present, ML models have comparable performance to clinicians in assessing musculoskeletal images. ML models may enhance the performance of clinicians as a technical supplement rather than as a replacement for clinical intelligence. Future ML-related studies should emphasize how ML models can complement clinicians, instead of determining the overall superiority of one versus the other. This can be accomplished by improving transparent reporting, diminishing bias, determining the feasibility of implantation in the clinical setting, and appropriately tempering conclusions. Level III, diagnostic study.
- Research Article
13
- 10.1007/s00261-021-03051-6
- Mar 22, 2021
- Abdominal Radiology
To develop and externally validate a multiphase computed tomography (CT)-based machine learning (ML) model for staging liver fibrosis (LF) by using whole liver slices. The development dataset comprised 232 patients with pathological analysis for LF, and the test dataset comprised 100 patients from an independent outside institution. Feature extraction was performed based on the precontrast (PCP), arterial (AP), portal vein (PVP) phase, and three-phase CT images. CatBoost was utilized for ML model investigation by using the features with good reproducibility. The diagnostic performance of ML models based on each single- and three-phase CT image was compared with that of radiologists' interpretations, the aminotransferase-to-platelet ratio index, and the fibrosis index based on four factors (FIB-4) by using the receiver operating characteristic curve with the area under the curve (AUC) value. Although the ML model based on three-phase CT image (AUC = 0.65-0.80) achieved higher AUC value than that based on PCP (AUC = 0.56-0.69) and PVP (AUC = 0.51-0.74) in predicting various stage of LF, significant difference was not found. The best CT-based ML model (AUC = 0.65-0.80) outperformed the FIB-4 in differentiating advanced LF and cirrhosis and radiologists' interpretation (AUC = 0.50-0.76) in the diagnosis of significant and advanced LF. All PCP, PVP, and three-phase CT-based ML models can be an acceptable in assessing LF, and the performance of the PCP-based ML model is comparable to that of the enhanced CT image-based ML model.
- Supplementary Content
23
- 10.2196/35293
- May 31, 2022
- JMIR Medical Informatics
BackgroundSeverity of illness scores—Acute Physiology and Chronic Health Evaluation, Simplified Acute Physiology Score, and Sequential Organ Failure Assessment—are current risk stratification and mortality prediction tools used in intensive care units (ICUs) worldwide. Developers of artificial intelligence or machine learning (ML) models predictive of ICU mortality use the severity of illness scores as a reference point when reporting the performance of these computational constructs.ObjectiveThis study aimed to perform a literature review and meta-analysis of articles that compared binary classification ML models with the severity of illness scores that predict ICU mortality and determine which models have superior performance. This review intends to provide actionable guidance to clinicians on the performance and validity of ML models in supporting clinical decision-making compared with the severity of illness score models.MethodsBetween December 15 and 18, 2020, we conducted a systematic search of PubMed, Scopus, Embase, and IEEE databases and reviewed studies published between 2000 and 2020 that compared the performance of binary ML models predictive of ICU mortality with the performance of severity of illness score models on the same data sets. We assessed the studies' characteristics, synthesized the results, meta-analyzed the discriminative performance of the ML and severity of illness score models, and performed tests of heterogeneity within and among studies.ResultsWe screened 461 abstracts, of which we assessed the full text of 66 (14.3%) articles. We included in the review 20 (4.3%) studies that developed 47 ML models based on 7 types of algorithms and compared them with 3 types of the severity of illness score models. Of the 20 studies, 4 (20%) were found to have a low risk of bias and applicability in model development, 7 (35%) performed external validation, 9 (45%) reported on calibration, 12 (60%) reported on classification measures, and 4 (20%) addressed explainability. The discriminative performance of the ML-based models, which was reported as AUROC, ranged between 0.728 and 0.99 and between 0.58 and 0.86 for the severity of illness score–based models. We noted substantial heterogeneity among the reported models and considerable variation among the AUROC estimates for both ML and severity of illness score model types.ConclusionsML-based models can accurately predict ICU mortality as an alternative to traditional scoring models. Although the range of performance of the ML models is superior to that of the severity of illness score models, the results cannot be generalized due to the high degree of heterogeneity. When presented with the option of choosing between severity of illness score or ML models for decision support, clinicians should select models that have been externally validated, tested in the practice environment, and updated to the patient population and practice environment.Trial RegistrationPROSPERO CRD42021203871; https://tinyurl.com/28v2nch8
- Research Article
2
- 10.1371/journal.pone.0307531
- Jul 24, 2024
- PloS one
This systematic review aimed to evaluate the performance of machine learning (ML) models in predicting post-treatment survival and disease progression outcomes, including recurrence and metastasis, in head and neck cancer (HNC) using clinicopathological structured data. A systematic search was conducted across the Medline, Scopus, Embase, Web of Science, and Google Scholar databases. The methodological characteristics and performance metrics of studies that developed and validated ML models were assessed. The risk of bias was evaluated using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Out of 5,560 unique records, 34 articles were included. For survival outcome, the ML model outperformed the Cox proportional hazards model in time-to-event analyses for HNC, with a concordance index of 0.70-0.79 vs. 0.66-0.76, and for all sub-sites including oral cavity (0.73-0.89 vs. 0.69-0.77) and larynx (0.71-0.85 vs. 0.57-0.74). In binary classification analysis, the area under the receiver operating characteristics (AUROC) of ML models ranged from 0.75-0.97, with an F1-score of 0.65-0.89 for HNC; AUROC of 0.61-0.91 and F1-score of 0.58-0.86 for the oral cavity; and AUROC of 0.76-0.97 and F1-score of 0.63-0.92 for the larynx. Disease-specific survival outcomes showed higher performance than overall survival outcomes, but the performance of ML models did not differ between three- and five-year follow-up durations. For disease progression outcomes, no time-to-event metrics were reported for ML models. For binary classification of the oral cavity, the only evaluated subsite, the AUROC ranged from 0.67 to 0.97, with F1-scores between 0.53 and 0.89. ML models have demonstrated considerable potential in predicting post-treatment survival and disease progression, consistently outperforming traditional linear models and their derived nomograms. Future research should incorporate more comprehensive treatment features, emphasize disease progression outcomes, and establish model generalizability through external validations and the use of multicenter datasets.
- Research Article
8
- 10.12989/gae.2021.25.1.001
- Jan 1, 2021
- Geomechanics and Engineering
Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.
- Research Article
17
- 10.2139/ssrn.3774075
- Jan 27, 2021
- SSRN Electronic Journal
In this paper we study the performance of several machine learning (ML) models for credit default prediction. We do so by using a unique and anonymized database from a major Spanish bank. We compare the statistical performance of a simple and traditionally used model like the Logistic Regression (Logit), with more advanced ones like Lasso penalized logistic regression, Classification And Regression Tree (CART), Random Forest, XGBoost and Deep Neural Networks. Following the process deployed for the supervisory validation of Internal Rating-Based (IRB) systems, we examine the benefits of using ML in terms of predictive power, both in classification and calibration. Running a simulation exercise for different sample sizes and number of features we are able to isolate the information advantage associated to the access to big amounts of data, and measure the ML model advantage. Despite the fact that ML models outperforms Logit both in classification and in calibration, more complex ML algorithms do not necessarily predict better. We then translate this statistical performance into economic impact. We do so by estimating the savings in regulatory capital when using ML models instead of a simpler model like Lasso to compute the risk-weighted assets. Our benchmark results show that implementing XGBoost could yield savings from 12.4% to 17% in terms of regulatory capital requirements under the IRB approach. This leads us to conclude that the potential benefits in economic terms for the institutions would be significant and this justify further research to better understand all the risks embedded in ML models.
- Research Article
3
- 10.14309/ctg.0000000000000705
- Jun 1, 2024
- Clinical and translational gastroenterology
Despite research efforts, predicting Clostridioides difficile incidence and its outcomes remains challenging. The aim of this systematic review was to evaluate the performance of machine learning (ML) models in predicting C. difficile infection (CDI) incidence and complications using clinical data from electronic health records. We conducted a comprehensive search of databases (OVID, Embase, MEDLINE ALL, Web of Science, and Scopus) from inception up to September 2023. Studies employing ML techniques for predicting CDI or its complications were included. The primary outcome was the type and performance of ML models assessed using the area under the receiver operating characteristic curve. Twelve retrospective studies that evaluated CDI incidence and/or outcomes were included. The most commonly used ML models were random forest and gradient boosting. The area under the receiver operating characteristic curve ranged from 0.60 to 0.81 for predicting CDI incidence, 0.59 to 0.80 for recurrence, and 0.64 to 0.88 for predicting complications. Advanced ML models demonstrated similar performance to traditional logistic regression. However, there was notable heterogeneity in defining CDI and the different outcomes, including incidence, recurrence, and complications, and a lack of external validation in most studies. ML models show promise in predicting CDI incidence and outcomes. However, the observed heterogeneity in CDI definitions and the lack of real-world validation highlight challenges in clinical implementation. Future research should focus on external validation and the use of standardized definitions across studies.
- Research Article
3
- 10.1016/j.ins.2023.01.072
- Jan 13, 2023
- Information Sciences
Data-driven evolutionary multi-task optimization for problems with complex solution spaces
- Research Article
- 10.1093/jamiaopen/ooae157
- Dec 26, 2024
- JAMIA open
Dimensionality reduction techniques aim to enhance the performance of machine learning (ML) models by reducing noise and mitigating overfitting. We sought to compare the effect of different dimensionality reduction methods for comorbidity features extracted from electronic health records (EHRs) on the performance of ML models for predicting the development of various sub-phenotypes in children with Neurofibromatosis type 1 (NF1). EHR-derived data from pediatric subjects with a confirmed clinical diagnosis of NF1 were used to create 10 unique comorbidities code-derived feature sets by incorporating dimensionality reduction techniques using raw International Classification of Diseases codes, Clinical Classifications Software Refined, and Phecode mapping schemes. We compared the performance of logistic regression, XGBoost, and random forest models utilizing each feature set. XGBoost-based predictive models were most successful at predicting NF1 sub-phenotypes. Overall, features based on domain knowledge-informed mapping schema performed better than unsupervised feature reduction methods. High-level features exhibited the worst performance across models and outcomes, suggesting excessive information loss with over-aggregation of features. Model performance is significantly impacted by dimensionality reduction techniques and varies by specific ML algorithm and outcome being predicted. Automated methods using existing knowledge and ontology databases can effectively aggregate features extracted from EHRs. Dimensionality reduction through feature aggregation can enhance the performance of ML models, particularly in high-dimensional datasets with small sample sizes, commonly found in EHRs health applications. However, if not carefully optimized, it can lead to information loss and data oversimplification, potentially adversely affecting model performance.
- Research Article
6
- 10.3390/en14217049
- Oct 28, 2021
- Energies
Building an effective Machine Learning (ML) model for a data set is a difficult task involving various steps. One of the most important steps is to compare a substantial amount of generated ML models to find the optimal one for deployment. It is challenging to compare such models with a dynamic number of features. Comparison is more than only finding differences of ML model performance, as users are also interested in the relations between features and model performance such as feature importance for ML explanations. This paper proposes RadialNet Chart, a novel visualisation approach, to compare ML models trained with a different number of features of a given data set while revealing implicit dependent relations. In RadialNet Chart, ML models and features are represented by lines and arcs, respectively. These lines are generated effectively using a recursive function. The dependence of ML models with a dynamic number of features is encoded into the structure of visualisation, where ML models and their dependent features are directly revealed from related line connections. ML model performance information is encoded with colour and line width in RadialNet Chart. Taken together with the structure of visualisation, feature importance can be directly discerned in RadialNet Chart for ML explanations. Compared with other commonly used visualisation approaches, RadialNet Chart can help to simplify the ML model comparison process with different benefits such as the following: more efficient in terms of helping users to focus their attention to find visual elements of interest and easier to compare ML performance to find optimal ML model and discern important features visually and directly instead of through complex algorithmic calculations for ML explanations.
- Research Article
8
- 10.3389/frai.2024.1365777
- Apr 5, 2024
- Frontiers in Artificial Intelligence
Machine learning (ML) techniques have gained increasing attention in the field of healthcare, including predicting outcomes in patients with lung cancer. ML has the potential to enhance prognostication in lung cancer patients and improve clinical decision-making. In this systematic review and meta-analysis, we aimed to evaluate the performance of ML models compared to logistic regression (LR) models in predicting overall survival in patients with lung cancer. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement. A comprehensive search was conducted in Medline, Embase, and Cochrane databases using a predefined search query. Two independent reviewers screened abstracts and conflicts were resolved by a third reviewer. Inclusion and exclusion criteria were applied to select eligible studies. Risk of bias assessment was performed using predefined criteria. Data extraction was conducted using the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies (CHARMS) checklist. Meta-analytic analysis was performed to compare the discriminative ability of ML and LR models. The literature search resulted in 3,635 studies, and 12 studies with a total of 211,068 patients were included in the analysis. Six studies reported confidence intervals and were included in the meta-analysis. The performance of ML models varied across studies, with C-statistics ranging from 0.60 to 0.85. The pooled analysis showed that ML models had higher discriminative ability compared to LR models, with a weighted average C-statistic of 0.78 for ML models compared to 0.70 for LR models. Machine learning models show promise in predicting overall survival in patients with lung cancer, with superior discriminative ability compared to logistic regression models. However, further validation and standardization of ML models are needed before their widespread implementation in clinical practice. Future research should focus on addressing the limitations of the current literature, such as potential bias and heterogeneity among studies, to improve the accuracy and generalizability of ML models for predicting outcomes in patients with lung cancer. Further research and development of ML models in this field may lead to improved patient outcomes and personalized treatment strategies.
- Research Article
1
- 10.3390/app15105776
- May 21, 2025
- Applied Sciences
Background: Hemodialysis (HD) patients have significantly higher mortality rates compared to the general population, primarily due to complex comorbidities. This systematic review and meta-analysis aimed to evaluate and compare the performance of various machine learning (ML) models in predicting mortality among HD patients. Methods: The analysis followed PRISMA guidelines, including studies that assessed the predictive capabilities of ML models for mortality in HD patients. Review Manager software version 5.4.1. was used for meta-analysis, and the performance of ML models was compared, including logistic regression, XGBoost, and Random Forest models. Results: The meta-analysis indicated that the logistic regression model predicted a true positive mortality rate of 8.23%, close to the actual rate of 10.53%. In contrast, the XGBoost and Random Forest models predicted rates of 9.93% and 8.94%, respectively, compared to the actual mortality rate of 13.73%. The highest area under the curve (AUC) was reported for the Random Forest model at a 3-year follow-up (AUC = 0.89). No significant difference was found between the performance of logistic regression and Random Forest models (p = 0.82). Conclusions: ML models, particularly Random Forest and logistic regression, demonstrated effective predictive capabilities for mortality in HD patients. These models can help identify high-risk patients early, facilitating personalized treatment strategies and potentially improving long-term outcomes. However, the observed heterogeneity among studies indicates a need for further research to refine model performance and standardize predictive features.
- Research Article
- 10.1093/bjd/ljae090.411
- Jun 28, 2024
- British Journal of Dermatology
The 7-point checklist (7PCL) is recommended by NICE to select patients with pigmented lesions and possible melanoma for urgent referral. This research investigates the potential of using machine learning (ML) models that utilize patient metadata from a teledermatology pathway including 7PCL for suspicious skin lesion detection, and compares the performance gain for each meta-feature added with the 7PCL. We analysed clinical metadata from 53 601 skin lesions in 25 105 patients who attended private skin cancer diagnosis clinics in the UK between 2015 and 2022. For each lesion, we included the following meta-features: 7PCL (change of lesion size, shape, colour, lesion &gt; 7 mm, inflamed, oozing, itching), weighted 7PCL, Williams score characteristics (patient age, patient sex, natural hair colour, arm mole count, sunburn history, prior nonmelanoma skin cancer), the overall Williams score, prior melanoma, lesion location, lesion age, and whether this was a predominantly nonpigmented pink lesion. All lesions were categorized as suspicious (10%) or nonsuspicious (90%) by skin cancer specialists during telemedicine triage (cancer detection rate = 5%). All meta-features were used as ML model inputs to classify whether each input was associated with suspicious or nonsuspicious classes. This study integrates five ML models (naive Bayes, logistics regression, support vector machine, random forest, and multilayer perceptron) to detect suspicious skin lesions based on patient metadata. Firstly, we compared the ML model performance between the original 7PCL and weighted 7PCL in correctly identifying suspicious skin lesions. We then added each meta-feature with 7PCL and tabulated the performance gain. We utilized balanced accuracy, sensitivity, specificity and area under the curve as evaluation metrics. There were no significant differences observed in model performance between the 7PCL and weighted 7PCL (sensitivity 68.1%). However, the performance of the ML model was improved significantly for each added meta-feature, with the best performance gain (sensitivity 85.2%) observed when 11 meta-features (lesion pink, Williams score, lesion age, sunburn, patient age, Williams group, patient sex, hair colour, site of the lesion on the body, freckling tendency, and mole count) were added to the 7PCL. This research has identified the optimal subset of meta-features that improve ML model performance for categorizing suspicious skin lesions during telemedicine triage, compared with 7PCL alone. Fusing these high-performing meta-features with image modalities is likely to further boost the ML model performance for skin cancer detection, and they could also be used to modify current skin cancer referral guidelines. The study is funded and supported by Innovate UK and the private teledermatology pathway provider.
- Research Article
- 10.1093/bjd/ljae090.055
- Jun 28, 2024
- British Journal of Dermatology
The 7-point checklist (7PCL) is recommended by NICE to select patients with pigmented lesions and possible melanoma for urgent referral. This research investigates the potential of using machine learning (ML) models that utilize patient metadata from a teledermatology pathway including 7PCL for suspicious skin lesion detection, and compares the performance gain for each meta-feature added with the 7PCL. We analysed clinical metadata from 53 601 skin lesions in 25 105 patients who attended private skin cancer diagnosis clinics in the UK between 2015 and 2022. For each lesion, we included the following meta-features: 7PCL (change of lesion size, shape, colour, lesion &gt; 7 mm, inflamed, oozing, itching), weighted 7PCL, Williams score characteristics (patient age, patient sex, natural hair colour, arm mole count, sunburn history, prior nonmelanoma skin cancer), the overall Williams score, prior melanoma, lesion location, lesion age, and whether this was a predominantly nonpigmented pink lesion. All lesions were categorized as suspicious (10%) or nonsuspicious (90%) by skin cancer specialists during telemedicine triage (cancer detection rate = 5%). All meta-features were used as ML model inputs to classify whether each input was associated with suspicious or nonsuspicious classes. This study integrates five ML models (naive Bayes, logistics regression, support vector machine, random forest, and multilayer perceptron) to detect suspicious skin lesions based on patient metadata. Firstly, we compared the ML model performance between the original 7PCL and weighted 7PCL in correctly identifying suspicious skin lesions. We then added each meta-feature with 7PCL and tabulated the performance gain. We utilized balanced accuracy, sensitivity, specificity and area under the curve as evaluation metrics. There were no significant differences observed in model performance between the 7PCL and weighted 7PCL (sensitivity 68.1%). However, the performance of the ML model was improved significantly for each added meta-feature, with the best performance gain (sensitivity 85.2%) observed when 11 meta-features (lesion pink, Williams score, lesion age, sunburn, patient age, Williams group, patient sex, hair colour, site of the lesion on the body, freckling tendency, and mole count) were added to the 7PCL. This research has identified the optimal subset of meta-features that improve ML model performance for categorizing suspicious skin lesions during telemedicine triage, compared with 7PCL alone. Fusing these high-performing meta-features with image modalities is likely to further boost the ML model performance for skin cancer detection, and they could also be used to modify current skin cancer referral guidelines. The study is funded and supported by Innovate UK and the private teledermatology pathway provider.
- Research Article
1
- 10.1016/j.wneu.2024.11.038
- Feb 1, 2025
- World Neurosurgery
Prediction of symptomatic intracranial hemorrhage before mechanical thrombectomy using machine learning in patients with anterior circulation large vessel occlusion
- Ask R Discovery
- Chat PDF