In this study, we proposed a general workflow that aims to enhance the ML-based geothermobarometer modelling. Our workflow focuses on three key areas. Firstly, we developed a robust pre-processing pipeline that addresses data imbalance, feature engineering, and data augmentation. Secondly, we assessed modelling errors using a Monte Carlo approach to quantify the impact of analytical uncertainties on the final pressure and temperature estimates. Thirdly, we implemented a robust strategy to validate and test the ML models to avoid over- and under-fitting issues while correcting biases associated with the application of specific ML models (i.e., tree-based ensembles).To facilitate the use of our workflow, we have developed a web app (https://bit.ly/ml-pt-web) and a Python module (https://bit.ly/ml-pt-py). The robustness of this strategy has been tested on two calibrations: clinopyroxene (cpx) and clinopyroxene-liquid (cpx-liq). Our results show a significant reduction in errors compared to the baseline model, as well as good generalization ability on an independent external dataset. The Root Mean Squared Errors are 57 °C and 2.5 kbar for the cpx calibration, and 36 °C and 2.1 kbar for the cpx-liq calibration. Finally, our models show improved outcomes on the external dataset compared to existing ML and classical cpx and cpx-liq thermobarometers.
Read full abstract