Abstract

Protein interactions play an essential role in studying living systems and life phenomena. A considerable amount of literature has been published on analyzing and predicting protein interactions, such as support vector machine method, homology-based method and similarity-based method, each has its pros and cons. Most existing methods for predicting protein interactions require prior domain knowledge, making it difficult to effectively extract protein features. Single method is dissatisfactory in predicting protein interactions, declaring the need for a comprehensive method that combines the advantages of various methods. On this basis, a deep ensemble learning method called EnAmDNN (Ensemble Deep Neural Networks with Attention Mechanism) is proposed to predict protein interactions which is an appropriate candidate for comprehensive learning, combining multiple models, and considering the advantages of various methods. Particularly, it encode protein sequences by the local descriptor, auto covariance, conjoint triad, pseudo amino acid composition and combine the vector representation of each protein in the protein interaction network. Then it takes advantage of the multi-layer convolutional neural networks to automatically extract protein features and construct an attention mechanism to analyze deep-seated relationships between proteins. We set up four different structures of deep learning models. In the ensemble learning model, second layer data sets are generated with five-fold cross validation from basic learners, then predict the protein interaction network by combining 16 models. Results on five independent PPI data sets demonstrate that EnAmDNN achieves superior prediction performance than other comparing methods.

Highlights

  • Protein interactions and interaction networks take part in vital activities of each living cell, including signal transduction, immune response, metabolism of energy substance, cell cycle control, etc. (Keskin et al, 2016)

  • Several methods based on amino acid sequence computation have been explored to predict PPI, such as support vector machine with traditional auto-correlation, k-nearest neighbor with local description (LD) (Yang et al, 2010), support vector machine (SVM) with conventional auto covariance (AC) (Guo et al, 2008) or local description (LD) (Zhou et al, 2011), deep neural network with amphiphilic Pseudo amino acid composition (PseAAC) descriptor (Du et al, 2017b) and so on

  • Motivated by attention mechanisms and ensemble learning, we propose an algorithm called EnAmDNN, which at first extracted the biophysical-chemical information of protein sequences through AC, CT, LD, and PseAAC and association with the interactive description of each protein in protein interaction network; it automatically extracted the protein features by multi-layer convolutional neural network, adopted attention mechanism to analyze deep-seated relationship of proteins and forms the feature vectors

Read more

Summary

INTRODUCTION

Protein interactions and interaction networks take part in vital activities of each living cell, including signal transduction, immune response, metabolism of energy substance, cell cycle control, etc. (Keskin et al, 2016). Motivated by attention mechanisms and ensemble learning, we propose an algorithm called EnAmDNN, which at first extracted the biophysical-chemical information of protein sequences through AC, CT, LD, and PseAAC and association with the interactive description of each protein in protein interaction network; it automatically extracted the protein features by multi-layer convolutional neural network, adopted attention mechanism to analyze deep-seated relationship of proteins and forms the feature vectors. Our contributions can be summarized as follows: (1) the new network structure can automatically extract highly abstract representations and detect the sequence specificity of proteins; (2) the attention mechanism is adopted to analyze internal links between the two proteins and the network description of each protein, instead of directly concatenating the two proteins, to improve the prediction accuracy; (3) ensemble learning considers the advantages of different descriptors and different DNNs to achieve comprehensive learning.

Methods
MATERIALS AND METHODS
Evaluation Criteria
RESULT
CONCLUSIONS
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call