Abstract
Sequential learning (SL) strategies, i.e. iteratively updating a machine learning model to guide experiments, have been proposed to significantly accelerate materials discovery and research. Applications on computational datasets and a handful of optimization experiments have demonstrated the promise of SL, motivating a quantitative evaluation of its ability to accelerate materials discovery, specifically in the case of physical experiments. The benchmarking effort in the present work quantifies the performance of SL algorithms with respect to a breadth of research goals: discovery of any “good” material, discovery of all “good” materials, and discovery of a model that accurately predicts the performance of new materials. To benchmark the effectiveness of different machine learning models against these goals, we use datasets in which the performance of all materials in the search space is known from high-throughput synthesis and electrochemistry experiments. Each dataset contains all pseudo-quaternary metal oxide combinations from a set of six elements (chemical space), the performance metric chosen is the electrocatalytic activity (overpotential) for the oxygen evolution reaction (OER). A diverse set of SL schemes is tested on four chemical spaces, each containing 2121 catalysts. The presented work suggests that research can be accelerated by up to a factor of 20 compared to random acquisition in specific scenarios. The results also show that certain choices of SL models are ill-suited for a given research goal resulting in substantial deceleration compared to random acquisition methods. The results provide quantitative guidance on how to tune an SL strategy for a given research goal and demonstrate the need for a new generation of materials-aware SL algorithms to further accelerate materials discovery.
Highlights
Accelerating materials discovery is of utmost importance for realization of several emergent technologies, to combat climate change through the adoption of zero or negative emission technologies such as hydrogen driven cars and other means of clean chemical energy generation, storage and utilization
The compendium of simulated learning results indicate that (i) exploration by uncertainty-based sample selection can accelerate the establishment of predictive models in niche situations where a substantial fraction of the search space is measured, random experiment selection is typically a suitable strategy; (ii) EFs and accelerate factor (AF) up to approximately 20Â are possible for identifying any or all top catalysts, demonstrating a ceiling for the extent by which sequential learning can improve catalyst discovery; (iii) EF and AF values well below 0.05 are observed, indicating that the oor for deleterious effects of sequential learning is relatively deep compared to the ceiling
Poor choices for machine learning (ML) model and/or acquisition function for a given experiment budget or research object can lead to substantially worse performance than random sample selection, a critical lesson that illustrates the importance of comprehensive work ow design in the context of speci c research objectives.[2]
Summary
Model guides experiment at each iteration based on pre-sampled data is a promising approach to accelerate materials research. The SL framework is designed to enable facile variation in both the machine learning model and acquisition function and is implemented under the assumptions of a discretized search space that represents all possible experiments, which we refer to as the sample set of size N. The SL cycle i results in the measurement of the FOM for a newly-selected point in the search space, thereby increasing the size of the training set to i + 1 samples This SL technique can be implemented with any machine learning model that provides a predicted FOM value and uncertainty of that prediction for each input coordinate. This variation is visualized by plotting the median value as well as shaded regions representing the 6th to 94th percentile, i.e. removing the top 2 and bottom 2 values from each set of 50 values for xALMi
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have