Abstract

Abstract We present machine learning models based on kernel-ridge regression for predicting X-ray photoelectron spec- tra of organic molecules originating from the ionization energies of 1s electrons in carbon (C), nitrogen (N), oxygen (O), and fluorine (F) atoms. We constructed training dataset through high-throughput calculations of core-electron binding energies (CEBEs) for 12,880 small organic molecules in the bigQM7ω dataset, employing the ∆-SCF formalism coupled with meta-GGA-DFT and a variationally converged basis set. The models are cost-effective, as they require the atomic coordinates of a molecule generated using universal force fields while estimating the target-level CEBEs corresponding to DFT-level equilibrium geometry. We explore transfer learning by utilizing the atomic environment feature vectors learned using a graph neural network framework in kernel-ridge regression. Additionally, we enhance accuracy using the ∆-machine learning framework by leveraging inexpensive baseline spectra derived from Kohn–Sham eigenvalues. Upon application to 208 com- binatorially substituted uracil molecules, larger than those in the training set, our analyses reveal that while the models may not yield quantitatively accurate predictions of CEBEs on a molecule-by-molecule basis, they do exhibit a strong linear correlation, which proves valuable for virtual high-throughput screening purposes. We present the dataset and models as the Python module, cebeconf, to facilitate further explorations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.