To enhance the efficiency of organic solar cells, accurately predicting the efficiency of new pairs of donor and acceptor materials is crucial. Presently, most machine learning studies rely on regression models, which often struggle to establish clear rules for distinguishing between high- and low-performing donor-acceptor pairs. This study proposes a novel approach by integrating interpretable AI, specifically using Shapely values, with four supervised machine learning classification models, namely, support vector machines, decision trees, random forest, and gradient boosting. These models aim to identify high-efficiency donor-acceptor pairs based solely on chemical structures and to extract important features that establish general design principles for distinguishing between high- and low-efficiency pairs. For validation purposes, an unsupervised machine learning algorithm utilizing loading vectors obtained from the principal component analysis is employed to identify crucial features associated with high-efficiency donor-acceptor pairs. Interestingly, the features identified by the supervised machine learning approach were found to be a subset of those identified by the unsupervised method. Noteworthy features include the van der Waals surface area, partial equalization of orbital electronegativity, Moreau-Broto autocorrelation, and molecular substructures. Leveraging these features, a backward-working model can be developed, facilitating exploration across a wide array of materials used in organic solar cells. This innovative approach will help navigate the vast chemical compound space of donor and acceptor materials essential in creating high-efficiency organic solar cells.
Read full abstract