Interpretable Machine Learning Methods Research Articles

Supervised machine learning (ML) offers an exciting suite of algorithms that could benefit research in sport science. In principle, supervised ML approaches were designed for pure prediction, as opposed to explanation, leading to a rise in powerful, but opaque, algorithms. Recently, two subdomains of ML-explainable ML, which allows us to "peek into the black box," and interpretable ML, which encourages using algorithms that are inherently interpretable-have grown in popularity. The increased transparency of these powerful ML algorithms may provide considerable support for the hypothetico-deductive framework, in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis. However, this paper shows why ML algorithms are fundamentally different from statistical methods, even when using explainable or interpretable approaches. Translating potential insights from supervised ML algorithms, while in many cases seemingly straightforward, can have unanticipated challenges. While supervised ML cannot be used to replace statistical methods, we propose ways in which the sport sciences community can take advantage of supervised ML in the hypothetico-deductive framework. In this manuscript we argue that supervised machine learning can and should augment our exploratory investigations in sport science, but that leveraging potential insights from supervised ML algorithms should be undertaken with caution. We justify our position through a careful examination of supervised machine learning, and provide a useful analogy to help elucidate our findings. Three case studies are provided to demonstrate how supervised machine learning can be integrated into exploratory analysis. Supervised machine learning should be integrated into the scientific workflow with requisite caution. The approaches described in this paper provide ways to safely leverage the strengths of machine learning-like the flexibility ML algorithms can provide for fitting complex patterns-while avoiding potential pitfalls-at best, like wasted effort and money, and at worst, like misguided clinical recommendations-that may arise when trying to integrate findings from ML algorithms into domain knowledge. KEY POINTS: Some supervised machine learning algorithms and statistical models are used to solve the same problem, y = f(x) + ε, but differ fundamentally in motivation and approach. The hypothetico-deductive framework-in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis-is one of the core frameworks comprising the scientific method. In the hypothetico-deductive framework, supervised machine learning can be used in an exploratory capacity. However, it cannot replace the use of statistical methods, even as explainable and interpretable machine learning methods become increasingly popular. Improper use of supervised machine learning in the hypothetico-deductive framework is tantamount to p-value hacking in statistical methods.

Read full abstract

This paper proposes the utility of interpretable ensemble learning models for predicting the mechanical properties (bulk, shear and Young moduli) of ABX3 perovskite compounds with the A, B, and X referring to the 3 elements that make the cubic 3-dimensional framework of the perovskite compounds. These models consist of 3 ensemble learning techniques namely CatBoost, Random Forest, and XGBoost. To expand the feature space, robust first-principles density functional theory calculations were used to generate some of the input features, namely elastic constants, density, volume per atom, and ground state energy per atom. The order of the input feature ranking that influences the machine learning (ML) model decisions was then determined. For this, we performed correlation analysis on the multi-dimensional input feature space, suppressed features with high collinearity, and selected features with limited correlation. We trained the three ensemble learning techniques on the desired vectorial input feature representation to predict the mechanical properties. Furthermore, we employed the Shapley Additive Explanations (SHAP) algorithm for analysing the intrinsic decision-making rationality of the ensemble learning models. We measured the performance in the context of the error metrics and coefficient of determination, R2. The results show that XGBoost outperforms other approaches when predicting the shear modulus or Young modulus of the perovskite compounds yielding the least error metrics and the highest R2 value (0.97) in the testing phase. However, both CatBoost and Random Forest outperformed XGBoost when attempting to predict the bulk modulus in the testing phase. The deficiency of the XGBoost in predicting the bulk modulus can be ascribed to an overfitting problem which can occur when the ML model gives accurate predictions for training data but not for test data. Furthermore, the SHAP algorithm provides an insight into the order of feature importance (from highest to lowest). Additionally, we conducted a post-analysis using a holistic ranking to analyse the relative importance of the SHAP feature impact comprehension for the examined ensemble learning techniques. Our findings indicate that the elastic constants are the most important input features influencing the predictive decision of the ensemble learning models.

Read full abstract

Interpretable Machine Learning Methods Research Articles

Related Topics

Articles published on Interpretable Machine Learning Methods

On Leveraging Machine Learning in Sport Science in the Hypothetico-deductive Framework.

Predicting temporomandibular disorders in adults using interpretable machine learning methods: a model development and validation study

Aerodynamic robustness optimization of aeroengine fan performance based on an interpretable dynamic machine learning method

Using interpretable machine learning methods to identify the relative importance of lifestyle factors for overweight and obesity in adults: pooled evidence from CHNS and NHANES

Interpretable machine learning study of a collector based on combined twisted-tape and wavy-tape inserts

Interpretable Clinical Decision-Making Application for Etiological Diagnosis of Ventricular Tachycardia Based on Machine Learning.

Estimation of compressive strength of concrete with manufactured sand and natural sand using interpretable artificial intelligence

Fast predesign methodology of centrifugal compressor for PEMFCs combining a physics-based loss model and an interpretable machine learning method

Projecting Large Fires in the Western US With an Interpretable and Accurate Hybrid Machine Learning Method

Impacts of process parameters on diesel reforming via interpretable machine learning

Deep humoral profiling coupled to interpretable machine learning unveils diagnostic markers and pathophysiology of schistosomiasis.

Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites

Assessing the destabilization risk of ecosystems dominated by carbon sequestration based on interpretable machine learning method

Prediction of Post-Treatment Visual Acuity in Age-Related Macular Degeneration Patients With an Interpretable Machine Learning Method.

An interpretable machine learning method for risk stratification of patients with acute coronary syndrome

Exploring the response and prediction of phytoplankton to environmental factors in eutrophic marine areas using interpretable machine learning methods

Identification of key risk factors for venous thromboembolism in urological inpatients based on the Caprini scale and interpretable machine learning methods

Perception of customer satisfaction and complaints based on BERTopic and interpretable machine learning: evidence from hotels in Xi’an

Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments.

Interpretable machine learning guided by physical mechanisms reveals drivers of runoff under dynamic land use changes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Interpretable Machine Learning Methods Research Articles

Related Topics

Articles published on Interpretable Machine Learning Methods

On Leveraging Machine Learning in Sport Science in the Hypothetico-deductive Framework.

Predicting temporomandibular disorders in adults using interpretable machine learning methods: a model development and validation study

Aerodynamic robustness optimization of aeroengine fan performance based on an interpretable dynamic machine learning method

Using interpretable machine learning methods to identify the relative importance of lifestyle factors for overweight and obesity in adults: pooled evidence from CHNS and NHANES

Interpretable machine learning study of a collector based on combined twisted-tape and wavy-tape inserts

Interpretable Clinical Decision-Making Application for Etiological Diagnosis of Ventricular Tachycardia Based on Machine Learning.

Estimation of compressive strength of concrete with manufactured sand and natural sand using interpretable artificial intelligence

Fast predesign methodology of centrifugal compressor for PEMFCs combining a physics-based loss model and an interpretable machine learning method

Projecting Large Fires in the Western US With an Interpretable and Accurate Hybrid Machine Learning Method

Impacts of process parameters on diesel reforming via interpretable machine learning

Deep humoral profiling coupled to interpretable machine learning unveils diagnostic markers and pathophysiology of schistosomiasis.

Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites

Assessing the destabilization risk of ecosystems dominated by carbon sequestration based on interpretable machine learning method

Prediction of Post-Treatment Visual Acuity in Age-Related Macular Degeneration Patients With an Interpretable Machine Learning Method.

An interpretable machine learning method for risk stratification of patients with acute coronary syndrome

Exploring the response and prediction of phytoplankton to environmental factors in eutrophic marine areas using interpretable machine learning methods

Identification of key risk factors for venous thromboembolism in urological inpatients based on the Caprini scale and interpretable machine learning methods

Perception of customer satisfaction and complaints based on BERTopic and interpretable machine learning: evidence from hotels in Xi’an

Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments.

Interpretable machine learning guided by physical mechanisms reveals drivers of runoff under dynamic land use changes