Abstract

Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction, each with a trade-off between accuracy, generality, and cost. Existing theoretical methods such as time-dependent density functional theory (TD-DFT) are generalizable across chemical space because of their robust physics-based foundations but still exhibit random and systematic errors with respect to experiment despite their high computational cost. Statistical methods can achieve high accuracy at a lower cost, but data sparsity and unoptimized molecule and solvent representations often limit their ability to generalize. Here, we utilize directed message passing neural networks (D-MPNNs) to represent both dye molecules and solvents for predictions of molecular absorption peaks in solution. Additionally, we demonstrate a multi-fidelity approach based on an auxiliary model trained on over 28 000 TD-DFT calculations that further improves accuracy and generalizability, as shown through rigorous splitting strategies. Combining several openly-available experimental datasets, we benchmark these methods against a state-of-the-art regression tree algorithm and compare the D-MPNN solvent representation to several alternatives. Finally, we explore the interpretability of the learned representations using dimensionality reduction and evaluate the use of ensemble variance as an estimator of the epistemic uncertainty in our predictions of molecular peak absorption in solution. The prediction methods proposed herein can be integrated with active learning, generative modeling, and experimental workflows to enable the more rapid design of molecules with targeted optical properties.

Highlights

  • Dye molecules are used in many applications ranging from sensitizers for solar cells to biomedical imaging and diagnostics [1,2]

  • Many theoretical methods have been developed for predicting molecular optical properties, including empirical tables, semiempirical methods, time-dependent density functional theory (TD-DFT), and wavefunction-based methods [3,4]

  • These results indicate that the ChemFluor and Deep4Chem datasets are relatively similar to one another, as are the Dye-Sensitized Solar Cell Database (DSSCDB) and Dye Aggregation (DyeAgg)

Read more

Summary

Introduction

Dye molecules are used in many applications ranging from sensitizers for solar cells to biomedical imaging and diagnostics [1,2]. Numerous theoretical and statistical methods exist to predict these properties, many of these methods are not sufficiently accurate or general, or require significant computational cost, all of which hinder their application to large and diverse sets of molecules. Many theoretical methods have been developed for predicting molecular optical properties, including empirical tables, semiempirical methods, time-dependent density functional theory (TD-DFT), and wavefunction-based methods [3,4]. TD-DFT has been the most widely used method for at least the past decade because of its favorable accuracy/cost trade-off and its capacity to be be coupled with continuum solvents approximations 5, and it has been benchmarked and reviewed extensively [6,7]. In parallel to theoretical methods, researchers have developed surrogate statistical models that predict UV/Vis spectra from molecular structure at a lower computational cost than TD-DFT. ML studies for predicting properties related to the electronically excited states of molecules have been reviewed recently [8,9]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.