Abstract

The fidelity of data is of paramount importance in the construction of reliable and accurate machine learning (ML) models. Low-fidelity data, although noisy, can usually be obtained for a large number of material samples. High-fidelity data, on the other hand, is time-consuming and oftentimes, only available for a limited number of target samples. While the former can provide useful information to help generalize the ML models over large materials space, the latter is useful to build more accurate surrogate models. Information fusion schemes that utilize the data available at multiple levels of fidelity can outperform traditional single fidelity based ML methods, such as Gaussian process regression. In this work, a variant of the multi-fidelity information fusion scheme, namely multi-fidelity co-kriging, is used to build powerful prediction models of polymer bandgaps. To benchmark this strategy, we utilize a bandgap dataset of 382 polymers, obtained at two levels of fidelity: using the Perdew-Burke-Ernzerhof (PBE) exchange-correlational functional (“low-fidelity”) and the Heyd-Scuseria-Ernzerhof (HSE06) functional(“high-fidelity”) of density functional theory. The multi-fidelity model, trained on both PBE and HSE06 data, outperforms a single-fidelity Gaussian process regression model trained on just HSE06 band-gaps in a number of scenarios and is also able to generalize better to a more diverse chemical space.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call