Abstract
Recently, the life of human beings around the entire world has been endangering by the spreading of pneumonia-causing virus, such as Coronavirus, COVID-19, and H1N1. To develop effective drugs against Coronavirus, knowledge of protein subcellular localization is indispensable. In 2019, a predictor called “pLoc_bal-mHum” was developed for identifying the subcellular localization of human proteins. Its predicted results are significantly better than its counterparts, particularly for those proteins that may simultaneously occur or move between two or more subcellular location sites. However, more efforts are definitely needed to further improve its power since pLoc_bal-mHum was still not trained by a “deep learning”, a very powerful technique developed recently. The present study was devoted to incorporate the “deep-learning” technique and develop a new predictor called “pLoc_Deep-mHum”. The global absolute true rate achieved by the new predictor is over 81% and its local accuracy is over 90%. Both are overwhelmingly superior to its counterparts. Moreover, a user-friendly web-server for the new predictor has been well established at http://www.jci-bioinfo.cn/pLoc_Deep-mHum/, which will become a very useful tool for fighting pandemic coronavirus and save the mankind of this planet.
Highlights
The strong point of this model is that it allows extracting the maximum amount of information from human protein features using CNN convolution layers
The new predictor developed via the above procedures is called “pLoc_Deep-mHum”, where “pLoc_Deep” stands for “predict subcellular localization by deep learning”, and “mHum” for “multi-label human proteins”
The newly proposed predictor pLoc_Deep-mHum is remarkably superior to the existing state-of-the-art predictor pLoc_bal-mHum in all the five metrics. It can be seen from the table that the absolute true rate achieved by the new predictor is over 81%, which is far beyond the reach of any other existing methods
Summary
Knowledge of the subcellular localization of proteins is crucially important for fulfilling the following two important goals: 1) revealing the intricate pathways that regulate biological processes at the cellular level [1, 2]. 2) selecting the right targets [3] for developing new drugs. In 2011, by extracting the GO (Gene Ontology) information of the proteins [6], the same predictor can be used to deal with multiple locations proteins, achieving 76% accuracy It is through these kinds of procedures and follow-up procedures, that the capacity in dealing with multi-site systems and raising the accuracy is further improved. As done in pLoc_bal-mHum [12] as well as many other recent publications in developing new prediction methods (see, e.g., [9-11, 13-50]), the guidelines of the 5-step rule [51] are followed. They are about the detailed procedures for 1) benchmark dataset, 2) sample formulation, 3) operation engine or algorithm, 4) cross-validation, and 5) web-server. Here our attentions are focused on the procedures that significantly differ from those in developing the predictor pLoc_bal-mHum [12]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.