Abstract

MotivationAutomated function prediction (AFP) of proteins is a large-scale multi-label classification problem. Two limitations of most network-based methods for AFP are (i) a single model must be trained for each species and (ii) protein sequence information is totally ignored. These limitations cause weaker performance than sequence-based methods. Thus, the challenge is how to develop a powerful network-based method for AFP to overcome these limitations.ResultsWe propose DeepGraphGO, an end-to-end, multispecies graph neural network-based method for AFP, which makes the most of both protein sequence and high-order protein network information. Our multispecies strategy allows one single model to be trained for all species, indicating a larger number of training samples than existing methods. Extensive experiments with a large-scale dataset show that DeepGraphGO outperforms a number of competing state-of-the-art methods significantly, including DeepGOPlus and three representative network-based methods: GeneMANIA, deepNF and clusDCA. We further confirm the effectiveness of our multispecies strategy and the advantage of DeepGraphGO over so-called difficult proteins. Finally, we integrate DeepGraphGO into the state-of-the-art ensemble method, NetGO, as a component and achieve a further performance improvement.Availability and implementation https://github.com/yourh/DeepGraphGO.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Proteins are building blocks of life, playing many crucial roles within organisms, such as catalyzing chemical reactions, coordinating signal pathway and providing structural support to cells (Weaver, 2011)

  • We propose DeepGraphGO, a semi-supervised, deep learning method, which takes the advantages of both protein sequence and network information through graph neural network (GNN) (Kipf and Welling, 2016)

  • We have four main findings: (i) DeepGraphGO achieved the best performance of both Fmax and AUPR in all three domains, especially for biological process ontology (BPO) and cellular component ontology (CCO)

Read more

Summary

Introduction

Proteins are building blocks of life, playing many crucial roles within organisms, such as catalyzing chemical reactions, coordinating signal pathway and providing structural support to cells (Weaver, 2011). In order to elucidate the mechanism of life, it is important to identify protein/gene functions, which are standarized by Gene Ontology (GO) (Ashburner et al, 2000). The number of known protein sequences increases rapidly due to the development of gene sequencing technologies. Only < 0:1% proteins have experimental GO annotations due to the high cost of biochemical experiments. To reduce this huge gap, developing an effective and efficient automatic protein function prediction (AFP) method is of great significance (Radivojac et al, 2013)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.