Predicting and interpreting oxide glass properties by machine learning using large datasets

Daniel R Cassar,Saulo Martiello Mastelini,Tiago Botari,Edesio Alcobaça,André C.P.L.F De Carvalho,Edgar D Zanotto

doi:10.1016/j.ceramint.2021.05.105

Abstract

With the advent of powerful computer simulation techniques, it is time to move from the widely used knowledge-guided empirical methods to approaches driven by data science, mainly machine learning algorithms. We investigated the predictive performance of three machine learning algorithms for six different glass properties. For such, we used an extensive dataset of about 150,000 oxide glasses, which was segmented into smaller datasets for each property investigated. Using the decision tree induction, k-nearest neighbors, and random forest algorithms, selected from a previous study of six algorithms, we induced predictive models for glass transition temperature, liquidus temperature, elastic modulus, thermal expansion coefficient, refractive index, and Abbe number. Moreover, each model was induced with default and tuned hyperparameter values. We demonstrate that, apart from the elastic modulus (which had the smallest training dataset), the induced predictive models for the other five properties yield a comparable uncertainty to the usual data spread. However, for glasses with extremely low or high values of these properties, the prediction uncertainty is significantly higher. Finally, as expected, glasses containing chemical elements that are poorly represented in the training set yielded higher prediction errors. The method developed here calls attention to the success and possible pitfalls of machine learning algorithms. The analysis of the SHAP values indicated the key elements that increase or decrease the value of the modeled properties. It also estimated the maximum possible increase or decrease. Insights gained by this analysis can help empirical compositional tuning and computer-aided inverse design of glass formulations.

Full Text