DEAttentionDTA: protein-ligand binding affinity prediction based on dynamic embedding and self-attention.

Xiying Chen,Houjin Zhang,Tianqiao Shen,Jinyong Yan,Min Yang,Jinsha Huang,Xiaoman Xie,Yunjun Yan,Li Xu

doi:10.1093/bioinformatics/btae319

Abstract

Predicting protein-ligand binding affinity is crucial in new drug discovery and development. However, most existing models rely on acquiring 3D structures of elusive proteins. Combining amino acid sequences with ligand sequences and better highlighting active sites are also significant challenges. We propose an innovative neural network model called DEAttentionDTA, based on dynamic word embeddings and a self-attention mechanism, for predicting protein-ligand binding affinity. DEAttentionDTA takes the 1D sequence information of proteins as input, including the global sequence features of amino acids, local features of the active pocket site, and linear representation information of the ligand molecule in the SMILE format. These three linear sequences are fed into a dynamic word-embedding layer based on a 1D convolutional neural network for embedding encoding and are correlated through a self-attention mechanism. The output affinity prediction values are generated using a linear layer. We compared DEAttentionDTA with various mainstream tools and achieved significantly superior results on the same dataset. We then assessed the performance of this model in the p38 protein family. The resource codes are available at https://github.com/whatamazing1/DEAttentionDTA.

Full Text