Descriptors calculated from molecular structure information can be used as explanatory variables in Bayesian optimization (BO). Even though structural and descriptor information can be obtained from various databases for general compounds, information on highly confidential compounds such as pharmaceutical intermediates and active pharmaceutical ingredients cannot be retrieved from these databases. In particular, determining the stable structure and electronic state of a compound via quantum chemical calculations from descriptor information requires considerable computational time. Although descriptor information can be obtained using density functional theory (DFT), which has a relatively light computational load, only conventional combinations of basis sets and functionals can be selected before experiments instead of the best ones. Few studies have discussed these effects on the search performance of BO, and good search performance is highly dependent on the application. Therefore, we developed a method to improve the search performance of BO by using descriptors computed from several combinations of basis sets and functionals. The dataset obtained from averaging multiple descriptor sets exhibited better BO search performance than that of a single descriptor dataset. In addition, the more descriptor sets used for averaging, the better the search performance. This method has a relatively small computational load and can be easily used by those who are unfamiliar with quantum chemical calculations.
Read full abstract