Abstract

Machine learning approaches are increasingly used in health research. Applications range from the identification of disease onset, classification of disease severity, to predicting epileptic seizures. Although machine learning can be a powerful tool, there is potential for misuse; model performance can be inflated through overfitting and, consequently, will not generalize to the greater population. The risk of misuse increases when the number of variables extracted from continuous data is almost unlimited, as is the case for neural, movement, and acoustic (e.g., speech and music) data. Given that health research may contain small sample sizes, and outcome variables can be noisier for clinical populations, there are important points that should be considered before using machine learning. We suggest best practices in machine learning including data formatting, reducing data dimensionality, model selection and evaluation, and other steps within the machine learning process. We further discuss some common pitfalls in applying machine learning to small sample sizes and high-dimensional data (e.g., speech biomarkers, neural and imaging data). We advocate for parsimonious approaches that include selecting the simplest machine learning method that best describes the data, preventing redundancy and overfitting through variable elimination, and ensuring that certain variables or approaches do not inflate machine learning outcomes. We further consider approaches that can identify the best predictors (or combinations thereof), as well as “black box” machine learning methods (e.g., deep learning). Finally, we discuss the limitations of current machine learning methods and pose future directions to broaden the applicability of machine learning tools and ensure the outcomes are robust against random factors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.