Abstract

Organic Photovoltaic (OPV) Solar Cells are a rapidly developing technology with promising capabilities over leading renewable energy sources. Screening methods for determining promising donor and acceptor molecules to augment the efficiencies of such cells can be substantially accelerated through deep learning. Textual descriptors, specifically Simplified Molecular Input Line Entry System (SMILES), are utilized as network inputs, while quantum-chemical calculations based on Density Function Theory (DFT) provide chemically-accurate targets for training and testing. We present a Long Short-Term Memory (LSTM) based network which uses a self-attention mechanism and a robust data augmentation routine to predict several OPV optoelectronic properties (e.g. highest occupied molecular orbital and lowest unoccupied molecular orbital). The LSTM cells, coupled with self-attention, learn the successive ordering and pairing of SMILES characters while attending to certain salient constituents of the molecule, which produce a robust understanding of the molecular graph. The Harvard Clean Energy Project (CEP) and National Renewable Energy Laboratory (NREL) OPV datasets are used for this study. The CEP dataset portion which we use contains ~1.2E6 candidate donor molecules with their respective DFT-computed properties, whereas the NREL OPV dataset possesses ~9.1E4 samples. Compared to contemporary graph-based model selections, our network reduces the MAE overall considered optoelectronic properties on the CEP and NREL OPV datasets by an average of 21.23% and 10.06% respectively. Furthermore, we demonstrate that our model generalizes well to the pharmaceutical drug discovery focused ZINC-250k dataset, reducing the MAE across all properties by an average of 28.2% from the current state-of-the-art model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call