Learning to SMILES: BAN-based strategies to improve latent representation learning from molecules.

Cheng-Kun Wu,Dong-Sheng Cao,Zhi-Jiang Yang,Ting-Jun Hou,Xiao-Chen Zhang,Ai-Ping Lu

doi:10.1093/bib/bbab327

Abstract

Computational methods have become indispensable tools to accelerate the drug discovery process and alleviate the excessive dependence on time-consuming and labor-intensive experiments. Traditional feature-engineering approaches heavily rely on expert knowledge to devise useful features, which could be costly and sometimes biased. The emerging deep learning (DL) methods deliver a data-driven method to automatically learn expressive representations from complex raw data. Inspired by this, researchers have attempted to apply various deep neural network models to simplified molecular input line entry specification (SMILES) strings, which contain all the composition and structure information of molecules. However, current models usually suffer from the scarcity of labeled data. This results in a low generalization ability of SMILES-based DL models, which prevents them from competing with the state-of-the-art computational methods. In this study, we utilized the BiLSTM (bidirectional long short term merory) attention network (BAN) in which we employed a novel multi-step attention mechanism to facilitate the extracting of key features from the SMILES strings. Meanwhile, SMILES enumeration was utilized as a data augmentation method in the training phase to substantially increase the number of labeled data and enlarge the probability of mining more patterns from complex SMILES. We again took advantage of SMILES enumeration in the prediction phase to rectify model prediction bias and provide a more accurate prediction. Combined with the BAN model, our strategies can greatly improve the performance of latent features learned from SMILES strings. In 11 canonical absorption, distribution, metabolism, excretion and toxicity-related tasks, our method outperformed the state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to SMILES: BAN-based strategies to improve latent representation learning from molecules.

Abstract

Talk to us

Similar Papers

More From: Briefings in Bioinformatics

Lead the way for us

Journal: Briefings in Bioinformatics	Publication Date: Aug 24, 2021
Citations: 43

Similar Papers

MERMAID: an open source automated hit-to-lead method based on deep reinforcement learning
Daiki Erikawa ... Masakazu Sekijima
Journal of Cheminformatics | VOL. 13
Daiki Erikawa, et. al.Daiki Erikawa ... Masakazu Sekijima
27 Nov 2021
Journal of Cheminformatics | VOL. 13

Can large language models understand molecules?
Shaghayegh Sadeghi ... Alioune Ngom
BMC Bioinformatics | VOL. 25
Shaghayegh Sadeghi, et. al.Shaghayegh Sadeghi ... Alioune Ngom
26 Jun 2024
BMC Bioinformatics | VOL. 25

QSAR modeling of toxicities of ionic liquids toward Staphylococcus aureus using SMILES and graph invariants
Shahram Lotfi ... Shahin Ahmadi
Structural Chemistry | VOL. 31
Shahram Lotfi, et. al.Shahram Lotfi ... Shahin Ahmadi
09 Jul 2020
Structural Chemistry | VOL. 31

In silico toxicity prediction by support vector machine and SMILES representation-based string kernel
D.-S Cao ... Y.-Z Liang
SAR and QSAR in Environmental Research | VOL. 23
D.-S Cao, et. al.D.-S Cao ... Y.-Z Liang
01 Jan 2012
SAR and QSAR in Environmental Research | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to SMILES: BAN-based strategies to improve latent representation learning from molecules.

Abstract

Talk to us

Similar Papers

More From: Briefings in Bioinformatics