Nonlinear optical (NLO) materials are of great importance in modern optics and industry because of their intrinsic capability of wavelength conversion. Bandgap is a key property of NLO crystals. In recent years, machine learning (ML) has become a powerful tool to predict the bandgaps of compounds before synthesis. However, the shortage of available experimental data of NLO crystals poses a significant challenge for the exploration of new NLO materials using ML. In this work, we proposed a new multi-fidelity ML approach based on the multilevel descriptors developed by us (Z.-Y. Zhang, X. Liu, L. Shen, L. Chen and W.-H. Fang, J. Phys. Chem. C, 2021, 125, 25175-25188) and the gradient boosting regression tree algorithm. The calculated and experimental bandgaps of NLO crystals were collected as the low- and high-fidelity labels, respectively. The experimental values were predicted based on chemical compositions of crystals without prior knowledge about crystal structures. The multi-fidelity ML model overcame the performance of single-fidelity predictor. Furthermore, it was observed that less accurate predictions on the low-fidelity label may result in more accurate prediction on the high-fidelity label, at least in the present case. Using the multi-fidelity ML model with the best performance in this work, the mean absolute error on the test set of experimental bandgaps was 0.293 eV, which is smaller than that using the single-fidelity model (0.355 eV). It is far from perfect but accurate enough as an effective computational tool in the first step to discover novel NLO materials.
Read full abstract