Abstract
Screening combinatorial space for novel materials, such as perovskite-like ones for photovoltaics, has resulted in a high amount of simulated high-throughput data and analysis thereof. This study proposes a comprehensive comparison of structural fingerprint-based machine learning models on seven open-source databases of perovskite-like materials to predict band gaps and energies. It shows that none of the given methods, including graph neural networks, are able to capture arbitrary databases evenly, while underlining that commonly used metrics are highly database-dependent in typical workflows. In addition, the applicability of variance selection and autoencoders to significantly reduce fingerprint size indicates that models built with common fingerprints only rely on a submanifold of the available fingerprint space.
Highlights
Perovskite-like materials are of paramount interest in the creation of novel photovoltaic devices
The same holds true for the graph neural network (GNN), which consistently only reaches the performance of the “worst” fingerprinting method
While all best-performing prediction mean absolute error (MAE) are of similar magnitude, it is notable that the baseline differs: in refs 46, 50, 51, the error of educated guessing is ≈800 meV, while it is only ≈300 meV in ref 19
Summary
Perovskite-like materials are of paramount interest in the creation of novel photovoltaic devices. While existing perovskite materials, such as CH3NH3PbI3, are unstable and/or contain toxic lead,[1,2] the available, combinatorial space of possible candidate compounds is extensive.[3] This is especially interesting when considering mixtures and different structural phases, which might have widely varying properties.[4,5] Notably for binary mixtures of selected ions, it is already well established that the relation between an experimentally measured property (e.g., band gap) and material concentrations can be fit with simple, analytic functions.[5,6] With the industry-led rise of machine learning (ML) methods, there has been growing interest to predict such a relationship in the high-dimensional space of all possible compounds using ML techniques.[7,8] While these approaches have been used for years in engineering and science in general,[9] the widespread application in computational materials science is relatively new and accompanied by the (re-)development of a wide range of “fingerprinting functions”.10−20 These are necessary to encode the typical atomic and structural information describing materials of interest into a numerical vector format necessary for common ML techniques. Recent efforts focus on the prospects of creating “new” materials from generative models or directly feeding the structural graph to a neural-network approximator.[23,39−41]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have