Experimental High‐Throughput Electrochemistry
Experimental high‐throughput electrochemistry (HTE) addresses fundamental limitations of classical electrochemical methods, which are often characterized by high manual effort, low experimental throughput, and limited reproducibility. By employing parallelized and automated experimental systems in combination with advanced data analysis techniques such as Bayesian optimization and machine learning, the development and optimization of electrochemical processes and materials can be significantly accelerated. Emphasis is placed on combinatorial approaches, automated laboratory platforms, and self‐driving systems. This review presents key technologies, application areas, and methodological advances in experimental HTE, including microelectrode arrays and robotics‐based platforms. The aim is to provide a comprehensive overview of the field, bridge existing knowledge gaps, contextualize current developments, and outline future innovation pathways for experimental electrochemical research. Although high‐throughput approaches have increasingly been applied across diverse areas such as battery research, electrocatalysis, and organic electrosynthesis, a coherent methodological overview of the underlying technologies, platform concepts, and levels of automation has been lacking. By consolidating previously scattered developments and systematically comparing different experimental strategies, this article provides a detailed picture of the current state of experimental HTE and identifies key directions for future research, particularly toward autonomous laboratory systems.
- Research Article
205
- 10.1016/j.joule.2022.03.003
- Apr 1, 2022
- Joule
Machine learning with knowledge constraints for process optimization of open-air perovskite solar cell manufacturing
- Research Article
- 10.25126/jtiik.20251219001
- Feb 27, 2025
- Jurnal Teknologi Informasi dan Ilmu Komputer
Kehamilan pada ibu hamil memiliki beragam risiko selama prosesnya seperti preeklampsia, diabetes dan hipertensi gestational. Seiring dengan perkembangan teknologi dan pemanfaatan data, implementasi machine learning dalam pengembangan early diagnosis system untuk tingkat risiko kehamilan telah banyak dilakukan. Namun kendala dalam penerapan machine learning adalah sulitnya menemukan konfigurasi parameter yang tepat agar model machine learning mampu memberikan akurasi prediksi yang mumpuni. Pada penelitian ini diusulkan metode optimasi berbasis Bayesian untuk mengoptimalisasikan hyper-parameter dari model Decision Tree (DT) dan Extreme Gradient Boosting (XGB). Kedua model teroptimasi tersebut dilatih dan diuji dengan menggunakan data risiko kehamilan yang diperoleh dari hasil pengukuran medis pada ibu hamil. Dari hasil evaluasi diketahui terdapat pengaruh jumlah iterasi pada Bayesian Optimization (BO). Implementasi BO pada model Decision Tree (BODT) menunjukkan adanya sedikit peningkatan nilai performa dibandingan dengan penelitian sebelumnya. Sementara itu, capaian performa tertinggi diperoleh oleh kombinasi model XGB dan Bayesian (BOXGB) dimana capaian nilai akurasi pada model BOXGB yaitu 87% diikuti dengan nilai rata-rata presisi, recall, dan F1-score masing-masing sebesar 88%, 87%, dan 88%. Secara keseluruhan implementasi Bayesian Optimization mampu memberikan setelan hyper-parameter yang dapat meningkatkan kemampuan model machine learning khususnya dalam memprediksi tingkat risiko kehamilan pada ibu hamil berdasarkan data pengukuran klinis. Abstract During pregnancy process there are various risks such as preeclampsia, gestational diabetes and gestational hypertension. Along with the developments in technology as well as data science, the implementation of machine learning in early diagnosis system for pregnancy risk levels prediction has been widely carried out. However, there is a challenge in implementing machine learning, which is find the suitable yet effective parameter configuration in training machine learning model to provides better prediction accuracy. This research proposes a Bayesian-based Optimization (BO) method to tune up the hyper-parameters of Decision Tree (DT) and Extreme Gradient Boosting (XGB) models. These two optimized models were trained and tested using maternal risk dataset obtained from the clinical-based measurement on pregnant woman. From the evaluation result, it can be found that the number of iterations has high influence on the BO performance. The implementation of BO toward DT model has slight increase in performance result compared to the previous research. Meanwhile, the highest performance result achieved by the combination of BO and XGB (BOXGB) model where the proposed model reaches 87% of accuracy, followed by average value of precision, recall, and F1-score of 88%, 87%, and 88%, respectively. Overall, the implementation of BO is able to direct the hyper-parameter configuration which improves the machine learning performance especially in predicting maternal risk level based on clinical-based measurement data.
- Research Article
- 10.25126/jtiik.2025129001
- Feb 27, 2025
- Jurnal Teknologi Informasi dan Ilmu Komputer
Kehamilan pada ibu hamil memiliki beragam risiko selama prosesnya seperti preeklampsia, diabetes dan hipertensi gestational. Seiring dengan perkembangan teknologi dan pemanfaatan data, implementasi machine learning dalam pengembangan early diagnosis system untuk tingkat risiko kehamilan telah banyak dilakukan. Namun kendala dalam penerapan machine learning adalah sulitnya menemukan konfigurasi parameter yang tepat agar model machine learning mampu memberikan akurasi prediksi yang mumpuni. Pada penelitian ini diusulkan metode optimasi berbasis Bayesian untuk mengoptimalisasikan hyper-parameter dari model Decision Tree (DT) dan Extreme Gradient Boosting (XGB). Kedua model teroptimasi tersebut dilatih dan diuji dengan menggunakan data risiko kehamilan yang diperoleh dari hasil pengukuran medis pada ibu hamil. Dari hasil evaluasi diketahui terdapat pengaruh jumlah iterasi pada Bayesian Optimization (BO). Implementasi BO pada model Decision Tree (BODT) menunjukkan adanya sedikit peningkatan nilai performa dibandingan dengan penelitian sebelumnya. Sementara itu, capaian performa tertinggi diperoleh oleh kombinasi model XGB dan Bayesian (BOXGB) dimana capaian nilai akurasi pada model BOXGB yaitu 87% diikuti dengan nilai rata-rata presisi, recall, dan F1-score masing-masing sebesar 88%, 87%, dan 88%. Secara keseluruhan implementasi Bayesian Optimization mampu memberikan setelan hyper-parameter yang dapat meningkatkan kemampuan model machine learning khususnya dalam memprediksi tingkat risiko kehamilan pada ibu hamil berdasarkan data pengukuran klinis. Abstract During pregnancy process there are various risks such as preeclampsia, gestational diabetes and gestational hypertension. Along with the developments in technology as well as data science, the implementation of machine learning in early diagnosis system for pregnancy risk levels prediction has been widely carried out. However, there is a challenge in implementing machine learning, which is find the suitable yet effective parameter configuration in training machine learning model to provides better prediction accuracy. This research proposes a Bayesian-based Optimization (BO) method to tune up the hyper-parameters of Decision Tree (DT) and Extreme Gradient Boosting (XGB) models. These two optimized models were trained and tested using maternal risk dataset obtained from the clinical-based measurement on pregnant woman. From the evaluation result, it can be found that the number of iterations has high influence on the BO performance. The implementation of BO toward DT model has slight increase in performance result compared to the previous research. Meanwhile, the highest performance result achieved by the combination of BO and XGB (BOXGB) model where the proposed model reaches 87% of accuracy, followed by average value of precision, recall, and F1-score of 88%, 87%, and 88%, respectively. Overall, the implementation of BO is able to direct the hyper-parameter configuration which improves the machine learning performance especially in predicting maternal risk level based on clinical-based measurement data.
- Conference Article
5
- 10.4043/30716-ms
- May 4, 2020
Inflow Control Devices (ICDs) help reduce the adverse consequences of uneven inflow issues in a lateral completion system. The most common uneven inflow consequences are early water breakthrough and gas coning in water-driven and saturated reservoirs. These issues lead to the dominance of undesired fluid production and consequently, reduced well productivity. Typically, uneven inflow issues are caused by different drivers, including heterogenous permeability, an uneven water saturation profile, and/or complex well completion in a lateral section of a given well. ICDs are placed in permanent positions along the lateral section of a well in order to control zonal production and improve well productivity. The goal of utilizing ICDs is to delay water or gas production and equalize the inflow production from the reservoir to wellbore. However, the uncertainty of reservoir characteristics and operational constraints add complexity to the ICD design and complicate optimization strategies. An optimum ICD design entails identifying the number and size of compartments, packer locations, ICD type, and number of ICDs in each compartment, and the ICD settings such as orifice diameter or flow restriction rating. Extensive reservoir modeling work can be performed to accurately quantify the impact of each ICD design on well production. The intent of this paper is to demonstrate that Bayesian optimization and machine learning techniques can help identify an optimized ICD design in a minimum number of reservoir simulation evaluations. These techniques are implemented into the reservoir simulation workflow to enhance the speed of the analysis and resulting value proposition for the operating customer.Using Gaussian Process Regression as a surrogate, Bayesian optimization makes use of a small number of initial reservoir simulation runs to quantify the uncertainty of the surrogate model in the parameter space. It makes use of an appropriate acquisition function (as determined by the desired exploration-exploitation tradeoff characteristics) to design the next sample (simulation run) to be evaluated. Unlike the ensemble-based optimization algorithms, Bayesian optimization points to the optimum solution sequentially (one evaluation at a time). The proposed workflow automates the optimization process of ICD design evaluation workflow times by 50% in our case studies. The 50% efficiency takes in the time to perform ICD optimization workflow. For instance, the manual iteration ICD design for case study 1 described in this paper was four weeks, which the proposed workflow shortened this time to two weeks.This paper presents two case studies in which the Bayesian optimization technique was used to identify the best ICD completion design. The space parameter in both case studies involves several variables, including the number and location of compartments, the number of ICDs per compartment, and the ICD settings (one such setting, for example, considers orifice diameter size). The goal in the first case study was to find an ICD design that can maximize the net present value over the well lifetime (set to 5 years), while reducing and delaying water production. In this first case study, an 800ft lateral in a horizontal well, with drastic variation of permeability along its lateral length, was considered. In the second case study, 4000ft horizontal length of a well with variations of permeability was analyzed. In this second case, the objective was to extend the life of the well by minimizing the gas-oil ratio and maximizing the oil recovery. The simulation runs stopped after 3 years of production and the best case was chosen based on the aforementioned criteria. In both case studies, the optimization algorithm setup was able to converge to an optimum ICD design within 20 reservoir simulation runs. This alone represents an improvement over the current manual trial and error process in which an expert uses human intuition.
- Research Article
11
- 10.1109/tcc.2024.3361070
- Jan 1, 2024
- IEEE Transactions on Cloud Computing
Bayesian Optimization (BO) is an efficient method for finding optimal cloud configurations for several types of applications. On the other hand, Machine Learning (ML) can provide helpful knowledge about the application at hand thanks to its predicting capabilities. This work proposes a general approach based on BO, which integrates elements from ML techniques in multiple ways, to find an optimal configuration of recurring jobs running in public and private cloud environments, possibly subject to black-box constraints, e.g., application execution time or accuracy. We test our approach by considering several use cases, including edge computing, scientific computing, and Big Data applications. Results show that our solution outperforms other state-of-the-art black-box techniques, including classical autotuning and BO- and ML-based algorithms, reducing the number of unfeasible executions and corresponding costs up to 2–4 times.
- Research Article
2
- 10.1080/27660400.2023.2300252
- Jan 16, 2024
- Science and Technology of Advanced Materials: Methods
Bayesian optimization, coupled with Gaussian process regression and acquisition functions, has proven to be a powerful tool in the field of experimental design.Nevertheless, it demands a profound proficiency in software programming, machine learning, and statistical concepts.This steep learning curve presents a substantial obstacle when implementing Bayesian optimization for experimental design.In order to overcome this challenge, a user-friendly graphical interface for Gaussian process regression and acquisition functions is proposed.This accessible tool can be readily accessed via web browsers, courtesy of the established CADS platform (available at https://cads.eng.hokudai.ac.jp/).Thus, the interface offers to perform Bayesian optimization without any programming or any extensive prior knowledge about Bayesian optimization and machine learning.
- Research Article
52
- 10.3390/s23156843
- Aug 1, 2023
- Sensors
Algorithms for machine learning have found extensive use in numerous fields and applications. One important aspect of effectively utilizing these algorithms is tuning the hyperparameters to match the specific task at hand. The selection and configuration of hyperparameters directly impact the performance of machine learning models. Achieving optimal hyperparameter settings often requires a deep understanding of the underlying models and the appropriate optimization techniques. While there are many automatic optimization techniques available, each with its own advantages and disadvantages, this article focuses on hyperparameter optimization for well-known machine learning models. It explores cutting-edge optimization methods such as metaheuristic algorithms, deep learning-based optimization, Bayesian optimization, and quantum optimization, and our paper focused mainly on metaheuristic and Bayesian optimization techniques and provides guidance on applying them to different machine learning algorithms. The article also presents real-world applications of hyperparameter optimization by conducting tests on spatial data collections for landslide susceptibility mapping. Based on the experiment's results, both Bayesian optimization and metaheuristic algorithms showed promising performance compared to baseline algorithms. For instance, the metaheuristic algorithm boosted the random forest model's overall accuracy by 5% and 3%, respectively, from baseline optimization methods GS and RS, and by 4% and 2% from baseline optimization methods GA and PSO. Additionally, for models like KNN and SVM, Bayesian methods with Gaussian processes had good results. When compared to the baseline algorithms RS and GS, the accuracy of the KNN model was enhanced by BO-TPE by 1% and 11%, respectively, and by BO-GP by 2% and 12%, respectively. For SVM, BO-TPE outperformed GS and RS by 6% in terms of performance, while BO-GP improved results by 5%. The paper thoroughly discusses the reasons behind the efficiency of these algorithms. By successfully identifying appropriate hyperparameter configurations, this research paper aims to assist researchers, spatial data analysts, and industrial users in developing machine learning models more effectively. The findings and insights provided in this paper can contribute to enhancing the performance and applicability of machine learning algorithms in various domains.
- Research Article
13
- 10.1016/j.xcrp.2025.102548
- May 1, 2025
- Cell Reports Physical Science
The integration of automation and data-driven methodologies offers a promising approach to accelerating materials discovery in energy storage research. Thus far, in battery research, coin-cell assembly has advanced to become nearly fully automated but remains largely disconnected from data-driven methods. To bridge the disconnect, this work presents a self-driving laboratory framework to accelerate electrolyte discovery by integrating automated coin-cell assembly, galvanostatic cycling of LiFePO 4 ||Li 4 Ti 5 O 12 organic-aqueous full cells, and Bayesian optimization for selecting subsequent experiments based on prior results. The study explored an organic-aqueous hybrid electrolyte system comprising four co-solvents and two lithium-conducting salts. Using this framework, cells with an optimized electrolyte cycled with at least 94% Coulombic efficiency. Additionally, online electrochemical mass spectrometry revealed that the optimized organic co-solvents successfully mitigated the parasitic hydrogen evolution reaction. The results highlight the potential of combining Bayesian optimization with autonomous full-cell experimentation while contributing new electrolyte design insights for next-generation aqueous batteries. • Stationary robotic platform, ODACell 2, combines battery testing and machine learning • Optimized electrolytes are dimethyl sulfoxide and trimethyl phosphate based • Data-driven insight suggests optimized electrolytes deviate from monosolvent systems • Operando gas analysis of optimized electrolytes shows suppressed hydrogen evolution A stationary robotic platform, ODACell 2, presents a self-driving lab framework combining Bayesian optimization with automated battery assembly, cycling, and liquid handling. It demonstrates the discovery of high-performance organic-aqueous hybrid electrolytes, achieving >94% Coulombic efficiency in full-cell cycling. Operando gas analysis shows mitigated hydrogen evolution in optimized electrolytes.
- Book Chapter
- 10.1108/978-1-64802-145-920251004
- Oct 15, 2020
With the strength of being able to apply probability to express all forms of uncertainty, Bayesian machine learning (ML) has been widely demonstrated with the capacity to compensate uncertainty and balance it with regularization in modeling. Simply put, this AI approach allows us to analyze data without explicit specification of interactions within parameters in a model. On the other hand, in the context of international economics, real exchange rate has been used as a proxy measure of the relative cost of living and the well-being between two countries. A rise in one country's real exchange rate often suggests an escalation of national cost of living relative to that of another country. Thus, measuring and forecasting real exchange rates has profound implications to both economists and business practitioners. In this study, we use different forms of Bayesian machine learning to real exchange rate forecasting and comparatively evaluate the performances of these AI models with their traditional econometric counterpart (Bayesian vector autoregression). Empirical results indicate that, the tested Bayesian ML models (Boltzmann machine, restricted Boltmann machine, and deep belief network) perform generally better than the non-ML model. Although there is no clear absolute winner among the three forms of Bayesian ML models, deep belief network seems to demonstrate an edge over the others given our limited empirical investigation. In addition, given the rapid state-of-the-art advancement of this AI approach and its practicality, the current chapter also provides an abbreviated overview of the tested Bayesian machine learning models and their technical dynamics.
- Conference Article
4
- 10.1109/itnec56291.2023.10082424
- Feb 24, 2023
The performance of a machine learning algorithm depends largely on determining a set of hyperparameters. These hyperparameters have a significant influence on the accuracy of the algorithm. With the increase in algorithm complexity, there are more and more candidates for hyperparameters. How to quickly and accurately select the right hyperparameters for a given problem has become a popular area of research. This paper is based on a Bayesian optimization approach to assist machine learning for hyperparameter extraction. It is also fully validated based on the task of dichotomous classification of true and false news. This paper analyses the principles of the Bayesian optimization approach and how it can be applied to machine learning model parameter selection. The machine learning models to be used in this paper include K-Nearest Neighbour (KNN), Random Forest as well as Gradient Boosted Decision Trees (GBDT). These three are commonly used machine learning models for binary classification problems, with different numbers and classes of hyperparameters. The results of the experiments show that adjusting the original hyperparameters of machine learning using Bayesian optimization can substantially improve classification accuracy. The research in this paper can also provide ideas for other similar work of super parameter selection.
- Research Article
- 10.1149/ma2025-02512490mtgabs
- Nov 24, 2025
- Electrochemical Society Meeting Abstracts
Chemical manufacturing accounts for 5% of US primary energy use and greenhouse gas emissions, primarily from fossil-fuel-derived heat driving conventional processes.1 Organic electrosynthesis offers a sustainable alternative by using renewable electricity directly, enabling efficient production under milder conditions with improved selectivity and reduced waste.2,3 Despite these advantages, the practical implementation of organic electrosynthesis at scale has been limited by several fundamental challenges. Limited mechanistic understanding and experimental insights into molecular processes at the electrode interface have made it difficult to address key challenges: controlling the concentration of reactive species at the electrode interface, managing mass transport limitations, and understanding the complex role of substrate and spectator ions in the electrical double layer. Furthermore, the presence of multiple competing reaction pathways often leads to unwanted by-products, particularly when dealing with organic mixtures.4-6 The adoption of electrochemical methods in industry has been historically constrained to processes where these challenges have been successfully addressed, as exemplified by the electrohydrodimerization of acrylonitrile to adiponitrile - the most successful industrial organic electrosynthesis process with annual production reaching 300,000 tons.7,8 While this process achieved practical viability through careful electrolyte design and reaction engineering, the fundamental molecular mechanisms enabling its success remain poorly understood, highlighting both the potential of electrosynthesis for sustainable chemical manufacturing and the critical need for mechanistic insights to guide the development of new processes.This work advances organic electrosynthesis through complementary approaches. First, we use in situ ATR-FTIR spectroscopy to show that tetraalkylammonium ions populate the electrical double layer, creating a microenvironment that favors interactions with organic molecules and enhances acrylonitrile concentration while expelling water molecules.9 Additionally, kinetic isotope effect studies reveal that propionitrile (PN) formation is rate-limited by proton transfer, while ADN formation likely is not. Electron paramagnetic resonance spectroscopy confirms the presence of free radicals during AN electroreduction, suggesting that coupling of PN radicals occurs primarily in the electrolyte. Finally, we demonstrate how electrochemical parameters governs product distributions in mixed-substrate electrosynthesis. Using high-throughput screening coupled with machine learning approaches, we systematically investigated the interplay between substrate composition, current density, and mass transport phenomena in the electrodimerization of acrylonitrile and crotononitrile mixtures. We reveal distinct reaction-limited and mass transport-limited regimes that dictate product selectivity, with preferential formation of adiponitrile occurring when radical generation from acrylonitrile outpaces that from crotononitrile under reaction-limited conditions. These findings establish a framework for understanding and controlling molecular processes at electrode interfaces in complex organic systems. The experimental techniques and reaction engineering strategies developed here open new possibilities for selective electrochemical transformations.1. U.S. Department of Energy. Manufacturing Energy and Carbon Footprints Report. (2018).2. Botte, G. G. Electrochemical manufacturing in the chemical industry. The Electrochemical Society Interface 23, 49 (2014).3. Frontana-Uribe, B. A., Little, R. D., Ibanez, J. G., Palma, A. & Vasquez-Medrano, R. Organic electrosynthesis: a promising green methodology in organic chemistry. Green Chemistry 12, 2099-2119, doi:10.1039/c0gc00382d (2010).4. Utley, J. Trends in organic electrosynthesis. Chemical Society Reviews 26, 157-167 (1997).5. Moeller, K. D. Using Physical Organic Chemistry To Shape the Course of Electrochemical Reactions. Chem Rev 118, 4817-4833, doi:10.1021/acs.chemrev.7b00656 (2018).6. McKenzie, E. C. R. et al. Versatile Tools for Understanding Electrosynthetic Mechanisms. Chem Rev 122, 3292-3335, doi:10.1021/acs.chemrev.1c00471 (2022).7. Danly, D. Development and commercialization of the Monsanto electrochemical adiponitrile process. Journal of The Electrochemical Society 131, 435C (1984).8. Seidler, J., Strugatchi, J., Gärtner, T. & Waldvogel, S. R. Does electrifying organic synthesis pay off? The energy efficiency of electro-organic conversions. MRS Energy & Sustainability 7, E42, doi:10.1557/mre.2020.42 (2021).9. Mathison, R. et al. Molecular Processes That Control Organic Electrosynthesis in Near-Electrode Microenvironments. J Am Chem Soc 147, 4296-4307, doi:10.1021/jacs.4c14420 (2025). Figure 1
- Research Article
951
- 10.1038/s41586-021-03213-y
- Feb 3, 2021
- Nature
Reaction optimization is fundamental to synthetic chemistry, from optimizing the yield of industrial processes to selecting conditions for the preparation of medicinal candidates1. Likewise, parameter optimization is omnipresent in artificial intelligence, from tuning virtual personal assistants to training social media and product recommendation systems2. Owing to the high cost associated with carrying out experiments, scientists in both areas set numerous (hyper)parameter values by evaluating only a small subset of the possible configurations. Bayesian optimization, an iterative response surface-based global optimization algorithm, has demonstrated exceptional performance in the tuning of machine learning models3. Bayesian optimization has also been recently applied in chemistry4-9; however, its application and assessment for reaction optimization in synthetic chemistry has not been investigated. Here we report the development of a framework for Bayesian reaction optimization and an open-source software tool that allows chemists to easily integrate state-of-the-art optimization algorithms into their everyday laboratory practices. We collect a large benchmark dataset for a palladium-catalysed direct arylation reaction, perform a systematic study of Bayesian optimization compared to human decision-making in reaction optimization, and apply Bayesian optimization to two real-world optimization efforts (Mitsunobu and deoxyfluorination reactions). Benchmarking is accomplished via an online game that links the decisions made by expert chemists and engineers to real experiments run in the laboratory. Our findings demonstrate that Bayesian optimizationoutperforms human decisionmaking in both average optimization efficiency (number of experiments) and consistency (variance of outcome against initially available data). Overall, our studies suggest that adopting Bayesian optimization methods into everyday laboratory practices could facilitate more efficient synthesis of functional chemicals by enabling better-informed, data-driven decisions about which experiments to run.
- Research Article
3
- 10.1371/journal.pone.0324205
- Jun 11, 2025
- PloS one
Developing vaccines with a better stability is an area of improvement to meet the global health needs of preventing infectious diseases. With the advancement of data science and artificial intelligence, innovative approaches have emerged. This manuscript highlights the applications of machine learning through two cases in which Bayesian optimization was used to develop viral vaccine formulations. The two case studies monitored the critical quality attributes of virus A in liquid form by infectious titer loss and virus B in freeze-dried form by glass transition temperature. Stepwise analysis and model optimization demonstrated progressive improvements of model quality and prediction accuracy. The cross-validation matrices of the models' predictions showed high R² and low root mean square errors, indicating their reliability. The prediction accuracy of models was further validated by using test datasets. Model analysis using prediction error plot, Shapeley Additive exPlanations, permutation importance, etc. can provide additional insights into relations between model and experimental design, the influence of features of interest, and non-linear responses. Overall, Bayesian optimization is a useful complementary tool in formulation development that can help scientists make effective data-driven decisions.
- Research Article
24
- 10.1038/s41598-022-23431-2
- Nov 17, 2022
- Scientific Reports
Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models’ predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.
- Supplementary Content
1
- 10.48550/arxiv.2207.04994
- Jul 11, 2022
- arXiv (Cornell University)
Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian Optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models' predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.