As an indicator of the optical characteristics of perovskite materials, the band gap is a crucial parameter that impacts the functionality of a wide range of optoelectronic devices. Obtaining the band gap of a material via a labor-intensive, time-consuming, and inefficient high-throughput calculation based on first principles is possible. However, it does not yield the most accurate results. Machine learning techniques emerge as a viable and effective substitute for conventional approaches in band gap prediction. This paper collected 201 pieces of data through the literature and open-source databases. By separating the features related to bits A, B, and X, a dataset of 1208 pieces of data containing 30 feature descriptors was established. The dataset underwent preprocessing, and the Pearson correlation coefficient method was employed to eliminate non-essential features as a subset of features. The band gap was predicted using the GBR algorithm, the random forest algorithm, the LightGBM algorithm, and the XGBoost algorithm, in that order, to construct a prediction model for organic-inorganic hybrid perovskites. The outcomes demonstrate that the XGBoost algorithm yielded an MAE value of 0.0901, an MSE value of 0.0173, and an R2 value of 0.991310. These values suggest that, compared to the other two models, the XGBoost model exhibits the lowest prediction error, suggesting that the input features may better fit the prediction model. Finally, analysis of the XGBoost-based prediction model's prediction results using the SHAP model interpretation method reveals that the occupancy rate of the A-position ion has the greatest impact on the prediction of the band gap and has an A-negative correlation with the prediction results of the band gap. The findings provide valuable insights into the relationship between the prediction of band gaps and significant characteristics of organic-inorganic hybrid perovskites.
Read full abstract