Abstract
BackgroundMachine learning is used to process big data volumes with complex non-linear relationships between predictive variables and predictions. Research into the usefulness of machine learning in small data volumes remains limited. AimTo compare conventional statistical methods and machine learning to predict angiogram outcomes in a small cohort of South African cardiac patients. MethodsThis is a retrospective study on patients with cardiac risk factors at Inkosi Albert Luthuli Central Hospital, Durban, South Africa, from 2002 to 2008. Models were designed using predictive risk factors to forecast a binary angiogram outcome (normal or abnormal) by applying conventional statistical models (binary logistic and log binomial) and stacking ensemble machine learning. ResultsThe outcome prevalence of abnormal angiograms was 99/173 (57%). Predictive data was used to model this outcome. The binary logistic regression model, which estimates odds ratio, was unsuitable. The log binomial model, which estimates relative risk, did not converge after various stepwise modelling attempts. Thereafter, machine learning models were used. These included logistic regression, k-nearest neighbour, decision tree, support vector machine, and naïve Bayes. The ensemble model amalgamated all algorithms and showed accuracy >70% and excellent performance at different thresholds with an area under the curve (AUC) > 80%. DiscussionThe logistic regression model was unsuitable because an odds ratio would have been unreliable and overestimated the true effect since the outcome prevalence was >10%. A log binomial model with relative risk estimates did not converge, possibly owing to the multiple predictive variables. Overall, conventional statistical models were unsuccessful in this instance. Machine learning models had limitations from a small dataset. However, the combined modelling with the stacking ensemble method produced good results in the small, homogenous database by exploiting the strengths of each contributing algorithm. ConclusionsResearchers may apply machine learning when conventional statistical models are inconclusive in homogenous small databases with multiple variables and a complex relationship to the outcome. Machine learning is a viable option even with relatively small cohorts if the number of predictive variables is also small.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.