Abstract
Motivation Protein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein-protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time. Results We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins (if available). We evaluate the performance of DeepGOPlus using the CAFA3 evaluation measures and achieve an Fmax of 0.390, 0.557 and 0.614 for BPO, MFO and CCO evaluations, respectively. These results would have made DeepGOPlus one of the three best predictors in CCO and the second best performing method in the BPO and MFO evaluations. We also compare DeepGOPlus with state-of-the-art methods such as DeepText2GO and GOLabeler on another dataset. DeepGOPlus can annotate around 40 protein sequences per second on common hardware, thereby making fast and accurate function predictions available for a wide range of proteins. Availability and implementation http://deepgoplus.bio2vec.net/ . Supplementary information Supplementary data are available at Bioinformatics online.
Highlights
Prediction of protein functions is a major task in bioinformatics that is important in understanding the role of proteins in disease pathobiology, the functions of metagenomes, or finding drug targets
To learn sequence motifs that are predictive of protein functions, we use one-dimensional convolutional neural networks (CNNs) over protein amino acid sequence to learn sequence patterns or motifs
We achieve the best performance in all three subontologies with our DeepGOPlus model which combines the DiamondScore and DeepGOCNN
Summary
Prediction of protein functions is a major task in bioinformatics that is important in understanding the role of proteins in disease pathobiology, the functions of metagenomes, or finding drug targets. A wide range of methods have been developed for predicting protein functions computationally (Fa et al, 2018; Jiang et al, 2016; Kahanda and Ben-Hur, 2017; Kulmanov et al, 2018; Radivojac et al, 2013; You et al, 2018a,b; Zhou et al, 2019). Protein functions can be predicted from protein sequences (Fa et al, 2018; Jiang et al, 2016; Kulmanov et al, 2018; Radivojac et al, 2013; You et al, 2018a, b; Zhou et al, 2019), protein–protein interactions (PPI) (Kulmanov et al, 2018), protein structures (Yang et al, 2015), biomedical literature and other features (Kahanda and Ben-Hur, 2017; You et al, 2018a). As proteins rarely function on their own, protein–protein interactions can be a good predictor for complex biological processes to which proteins contribute.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.