Abstract

Scalar coupling constant (SCC) plays a key role in the analysis of three-dimensional structure of organic matter, however, the traditional SCC prediction using quantum mechanical calculations is very time-consuming. To calculate SCC efficiently and accurately, we proposed a graph embedding local self-attention encoder (GELAE) model, in which, a novel invariant structure representation of the coupling system in terms of bond length, bond angle and dihedral angle was presented firstly, and then a local self-attention module embedded with the adjacent matrix of a graph was designed to extract effectively the features of coupling systems, finally, with a modified classification loss function, the SCC was predicted. To validate the superiority of the proposed method, we conducted a series of comparison experiments using different structure representations, different attention modules, and different losses. The experimental results demonstrate that, compared to the traditional chemical bond structure representations, the rotation and translation invariant structure representations proposed in this work can improve the SCC prediction accuracy; with the graph embedded local self-attention, the mean absolute error (MAE) of the prediction model in the validation set decreases from 0.1603 Hz to 0.1067 Hz; using the classification based loss function instead of the scaled regression loss, the MAE of the predicted SCC can be decreased to 0.0963 HZ, which is close to the quantum chemistry standard on CHAMPS dataset.

Highlights

  • Determining the structure of unknown compounds plays a key role in the development of new materials or drugs

  • The teratogenic factors come from the difference between the three-dimensional structures of the drug and its isomer, so it is of great significance to strictly characterize the drug molecule structure, which leads to the potential application of Nuclear Magnetic Resonance (NMR) spectroscopy in the determination of unknown molecule structure

  • To verify the superiority of the input expressions proposed in this work, we investigated the influences of the input expressed by chemical bond vector, atom, and charge, denoted as Input_E1, and the input expressed by bond length, bond angle, dihedral angle, atom, and charge, denoted as Input_E2

Read more

Summary

Introduction

Determining the structure of unknown compounds plays a key role in the development of new materials or drugs. Stereoisomerism has a great influence on the properties of drugs, the chemical bonds of chiral molecules [1] are exactly the same, but the drug efficacies are quite different, even one is active and the other is toxic. The teratogenic factors come from the difference between the three-dimensional structures of the drug and its isomer, so it is of great significance to strictly characterize the drug molecule structure, which leads to the potential application of Nuclear Magnetic Resonance (NMR) spectroscopy in the determination of unknown molecule structure. NMR, combined with mass spectrometry and infrared spectroscopy, can determine the precise structure of organic molecules [3]. The two key parameters in NMR analysis are chemical shift and scalar coupling constant (SCC), the former mainly reflects the chemical environment in which the nucleus is located, and the latter indicates the stereochemical information

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call