Abstract

Implementation on rapid and accurate bandgap prediction has great practical implications for a range of applications. While quantum mechanical computations are enormously computation-time intensive, using informatics-based statistical learning approaches can be a promising alternative due to its availability, power and relatively limited cost of high-performance computational equipment. Here we demonstrate a systematic ensemble learning model which integrates a novel feature-engineering approach and a robust learning framework for predicting bandgaps of one series of typical thermoelectric materials: chalcogenides with diamond-like structure. After combining a feature crossing technique with a feature selection method, the proposed optimal descriptor set is identified by searching the feature space of 23,454 descriptors stemmed from the elemental features. Stable statistic-based feature selection methods are applied to identify the most crucial and relevant descriptors. The stacked ensemble learning model, which integrates the advantages of three different level-0 models (LASSO, SVR, and AdaBoost) and one level-1 model (GBDT), obtains 90.48% prediction accuracy, thus improving model accuracy and robustness. The results demonstrate the interpretability and generalizability of the stacked ensemble model, which can be applied to bandgap predictions in other material systems, thereby accelerating the design and optimization process for discovering new functional materials.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call