Abstract

Recently, the life of human beings around the entire world has been endangering by the spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1. To develop effective drugs against Coronavirus, knowledge of protein subcellular localization is indispensable. In 2019, a predictor called “pLoc_bal-mHum” was developed for identifying the subcellular localization of human proteins. Its predicted results are significantly better than its counterparts, particularly for those proteins that may simultaneously occur or move between two or more subcellular location sites. However, more efforts are definitely needed to further improve its power since pLoc_bal-mHum was still not trained by a “deep learning”, a very powerful technique developed recently. The present study was devoted to incorporate the “deep-learning” technique and develop a new predictor called “pLoc_Deep-mHum”. The global absolute true rate achieved by the new predictor is over 81% and its local accuracy is over 90%. Both are overwhelmingly superior to its counterparts. Moreover, a user-friendly web-server for the new predictor has been well established at http://www.jci-bioinfo.cn/pLoc_Deep-mHum/, which will become a very useful tool for fighting pandemic coronavirus and save the mankind of this planet.

Highlights

  • The strong point of this model is that it allows extracting the maximum amount of information from human protein features using CNN convolution layers

  • The new predictor developed via the above procedures is called “pLoc_Deep-mHum”, where “pLoc_Deep” stands for “predict subcellular localization by deep learning”, and “mHum” for “multi-label human proteins”

  • The newly proposed predictor pLoc_Deep-mHum is remarkably superior to the existing state-of-the-art predictor pLoc_bal-mHum in all the five metrics. It can be seen from the table that the absolute true rate achieved by the new predictor is over 81%, which is far beyond the reach of any other existing methods

Read more

Summary

INTRODUCTION

Knowledge of the subcellular localization of proteins is crucially important for fulfilling the following two important goals: 1) revealing the intricate pathways that regulate biological processes at the cellular level [1, 2]. 2) selecting the right targets [3] for developing new drugs. In 2011, by extracting the GO (Gene Ontology) information of the proteins [6], the same predictor can be used to deal with multiple locations proteins, achieving 76% accuracy It is through these kinds of procedures and follow-up procedures, that the capacity in dealing with multi-site systems and raising the accuracy is further improved. As done in pLoc_bal-mHum [12] as well as many other recent publications in developing new prediction methods (see, e.g., [9-11, 13-50]), the guidelines of the 5-step rule [51] are followed. They are about the detailed procedures for 1) benchmark dataset, 2) sample formulation, 3) operation engine or algorithm, 4) cross-validation, and 5) web-server. Here our attentions are focused on the procedures that significantly differ from those in developing the predictor pLoc_bal-mHum [12]

Benchmark Dataset
Proteins Sample Formulation
Architecture for the Novel CNN-BiLSTM Network
RESULTS AND DISCUSSION
A Set of Five Metrics for Multi-Label Systems
Comparison with the State-of-the-Art Predictor
Comparison with Several Classic Machine Learning Methods
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.