Density functional theory (DFT) calculations are widely used for material property prediction, but their computational cost can hinder the discovery of novel perovskites. This work explores machine learning (ML) as a faster alternative for predicting band gaps in complex perovskites, focusing on low-symmetry double and layered structures. We employ Support Vector Regression (SVR), Random Forest Regression (RFR), Gradient Boosting Regression (GBR), and Extreme Gradient Boosting (XGBoost) to predict both direct and indirect band gaps. Model performance is evaluated using Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R²) metrics. Our results reveal SVR as the most effective general model for predicting band gaps in both double and layered perovskites. Interestingly, for double perovskites specifically, XGBoost achieves even higher accuracy when incorporating derivative discontinuity as a feature. Feature importance analysis identifies the standard deviation of valence charges ("Valence (std)") as the most critical factor for band gap prediction across all studied perovskites. This research demonstrates the potential of ML for efficient and accurate band gap prediction in complex perovskites, accelerating material discovery efforts.
Read full abstract