Abstract
The prediction of the band gap of semiconductor materials using machine learning has gradually progressed in recent years. However, the performance of such prediction still needs further optimization. This work applies the stacking approach, which fuses the output of multiple baseline models, to further enhance the performance of band gap regression. Ten baseline models are optimized to predict the band gap of materials. Afterwards, the output of models with relatively better performance is used as the input features of the stacking approach. This research employed a benchmark dataset containing 3896 inorganic compounds, with 136 dimensions, and a newly established complex database (E-AFLOW), containing 21,534 compounds with 206 dimensions, to prove the effectiveness of different models. The trained stacking model based on the E-AFLOW database is then applied to determine the band gaps of different new compounds. The results demonstrate that the stacking model has the highest R2 value, at 0.920, in benchmark dataset and a value of 0.917 in the E-AFLOW dataset, with 5-flod cross validation. For the E-AFLOW dataset, the improvement percentage of RMSE, MAE, MAPE, and R2 of the stacking model to GBDT, XGB, RF, and LGB input baseline models are between 3.06%–17.54%, 8.12%–33.25%, 7.69%–33.33%, and 0.66%-4.44%, respectively. In real applications, the trained stacking model based on the E-AFLOW dataset can predict the band gaps of 78.57% of new materials within ± 8.00% of observed measurements. The minimum deviation between the predicted and observed values is −0.02%, and the maximum is 14.27%. These results convincingly demonstrate the excellent performance of stacking approach in band gap regression.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.