Abstract
Protein hydroxylation is one type of post-translational modifications (PTMs) playing critical roles in human diseases. It is known that protein sequence contains many uncharacterized residues of proline and lysine. The question that needs to be answered is: which residue can be hydroxylated, and which one cannot. The answer will not only help understand the mechanism of hydroxylation but can also benefit the development of new drugs. In this paper, we proposed a novel approach for predicting hydroxylation using a hybrid deep learning model integrating the convolutional neural network (CNN) and long short-term memory network (LSTM). We employed a pseudo amino acid composition (PseAAC) method to construct valid benchmark datasets based on a sliding window strategy and used the position-specific scoring matrix (PSSM) to represent samples as inputs to the deep learning model. In addition, we compared our method with popular predictors including CNN, iHyd-PseAAC, and iHyd-PseCp. The results for 5-fold cross-validations all demonstrated that our method significantly outperforms the other methods in prediction accuracy.
Highlights
As a type of post-translational modification, hydroxylation converts a CH group into a COH group in a protein [1]
CPU to our models, and our framework is illustrated in is a flexible implement our convolutional neural network (CNN)+long short-term memory network (LSTM) and CNN models, and our framework is illustrated in Figure and efficient library for deep learning
In order to test the performance of predictor, compared with CNN+LSTM
Summary
As a type of post-translational modification, hydroxylation converts a CH group into a COH group in a protein [1]. Protein hydroxylation usually happens in proline and lysine residues, which are called hydroxyproline and hydroxylysine, respectively. Predicting hydroxyproline and hydroxylysine sites in proteins may provide useful information for both biomedical research and drug development. With the development of high-throughput sequencing techniques, more and more protein sequences have been sequenced and stored, which presents an unprecedented opportunity as well as a big challenge for computational methods to predict hydroxylation residues in proteins. There are a few attempts in predicting hydroxylation residues using machine learning-based methods. In 2009, Yang et al [7] classified collagen hydroxyproline sites by developing two support vector machines
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have