Abstract
Neural networks and deep learning have been successfully applied to tackle problems in drug discovery with increasing accuracy over time. There are still many challenges and opportunities to improve molecular property predictions with satisfactory accuracy even further. Here, we proposed a deep-learning architecture model, namely Bidirectional long short-term memory with Channel and Spatial Attention network (BCSA), of which the training process is fully data-driven and end to end. It is based on data augmentation and SMILES tokenization technology without relying on auxiliary knowledge, such as complex spatial structure. In addition, our model takes the advantages of the long- and short-term memory network (LSTM) in sequence processing. The embedded channel and spatial attention modules in turn specifically identify the prime factors in the SMILES sequence for predicting properties. The model was further improved by Bayesian optimization. In this work, we demonstrate that the trained BSCA model is capable of predicting aqueous solubility. Furthermore, our proposed method shows noticeable superiorities and competitiveness in predicting oil–water partition coefficient, when compared with state-of-the-art graphs models, including graph convoluted network (GCN), message-passing neural network (MPNN), and AttentiveFP.
Highlights
The current mainstream algorithms for molecular characterization can be divided into two categories—a graph model based on molecular graphs, or a sequence model based on SMILES (Simplified Molecular-Input Line-Entry System) [19] sequence input
For accurate pFroedr iactcicounraotfe apqrueedoicutsiosnoloufbailqituye, owues sporloupboislietdy, awneepnrdo-ptoo-seendd adneeepn-d-to-end deeplearning framelweaorrnki,nign fsrhaomrtewBCorSkA,inwshhicohrtcBoCmSbAin, ewshaicBhILcSoTmMbinneesuraaBl InLeStwTMorkneaunrdalthneetwork and the channel and spcahtiaanlnaettleanntdiosnpmatoiadlualtetse.nBtiyoenxmploodituinlegs.thBeyaedxvpalonittaingegstohfemadovleacnutlaagreSsMofILmEoSlecular SMILES strings as trainisntrginingps uatsst,roauinriBnCg SinApmutos,doeul rwBoCuSldAbme oabdleel twoocuapldtubreeadbilreectotlycatphteucroemdpirleecxtly the complex spatial informastpioantiaolfincofonrnmecatteiodnaotofmcosn, nwehcitcehd haatosmpos,sewdhaichgrheaats cphoaslelednagegrienapt rcehvailoleunsge in previous attempts at theapttreemdipcttsioant.tThheeporevdeircftitiotinn.gTphreobolveemrfiattriinsigngprforobmlemsmarailslidnagtafrsoemt ssizmeailsl adlasotaset size is circumvented by SMILES enumeration, which effectively enriches the sample size for training
The channel and spatial attention modules facilitate the identification of influential attributes between adjacent atoms in the SMILES, without incurring greater overhead in computation
Summary
Accurate prediction of molecular properties would offer reliable guidance in profiling lead compounds in the drug-discovery process. Graph-based learning methods have been widely developed in the field of drug development [17,20–25]. Schütt et al [25] proposed a Continuous-Filter Convolutional Neural Network modeling quantum interactions in molecules, and AttentiveFP [21] proposed a new type of GNN with graph attention mechanism suitable for molecular characterization. The latter has the best prediction expression on a drug-discovery-related dataset. C.W.; Jin, W.; Rogers, L.; Jamison, T.F.; Jaakkola, T.S.; Green, W.H.; Barzilay, R.; Jensen, K.F. A graph-convolutional neural network model for the prediction of chemical reactivity. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. The message passing neural networks for chemical property prediction on SMILES.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.