Abstract

The prediction of the band gap of semiconductor materials using machine learning has gradually progressed in recent years. However, the performance of such prediction still needs further optimization. This work applies the stacking approach, which fuses the output of multiple baseline models, to further enhance the performance of band gap regression. Ten baseline models are optimized to predict the band gap of materials. Afterwards, the output of models with relatively better performance is used as the input features of the stacking approach. This research employed a benchmark dataset containing 3896 inorganic compounds, with 136 dimensions, and a newly established complex database (E-AFLOW), containing 21,534 compounds with 206 dimensions, to prove the effectiveness of different models. The trained stacking model based on the E-AFLOW database is then applied to determine the band gaps of different new compounds. The results demonstrate that the stacking model has the highest R2 value, at 0.920, in benchmark dataset and a value of 0.917 in the E-AFLOW dataset, with 5-flod cross validation. For the E-AFLOW dataset, the improvement percentage of RMSE, MAE, MAPE, and R2 of the stacking model to GBDT, XGB, RF, and LGB input baseline models are between 3.06%–17.54%, 8.12%–33.25%, 7.69%–33.33%, and 0.66%-4.44%, respectively. In real applications, the trained stacking model based on the E-AFLOW dataset can predict the band gaps of 78.57% of new materials within ± 8.00% of observed measurements. The minimum deviation between the predicted and observed values is −0.02%, and the maximum is 14.27%. These results convincingly demonstrate the excellent performance of stacking approach in band gap regression.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call