Abstract
The basic unit in life is cell. It contains many protein molecules located at its different organelles. The growth and reproduction of a cell as well as most of its other biological functions are performed via these proteins. But proteins in different organelles or subcellular locations have different functions. Facing the avalanche of protein sequences generated in the postgenomic age, we are challenged to develop high throughput tools for identifying the subcellular localization of proteins based on their sequence information alone. Although considerable efforts have been made in this regard, the problem is far apart from being solved yet. Most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions that are particularly important for drug targets. Using the ML-GKR (Multi-Label Gaussian Kernel Regression) method, we developed a new predictor called “pLoc-mGpos” by in-depth extracting the key information from GO (Gene Ontology) into the Chou’s general PseAAC (Pseudo Amino Acid Composition) for predicting the subcellular localization of Gram-positive bacterial proteins with both single and multiple location sites. Rigorous cross-validation on a same stringent benchmark dataset indicated that the proposed pLoc-mGpos predictor is remarkably superior to “iLoc-Gpos”, the state-of-the-art predictor for the same purpose. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new powerful predictor has been established at http://www.jci-bioinfo.cn/pLoc-mGpos/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Highlights
As the most basic unit of life, a cell must undergo three most important processes of any living things: growth, reproduction, and death [1]
Using the ML-GKR (Multi-Label Gaussian Kernel Regression) method, we developed a new predictor called “pLoc-mGpos” by in-depth extracting the key information from GO (Gene Ontology) into the Chou’s general PseAAC (Pseudo Amino Acid Composition) for predicting the subcellular localization of Gram-positive bacterial proteins with both single and multiple location sites
Compared with iLoc-Gpos [27], the existing most powerful predictor that has the capacity to deal with the multiple locations of Gram-positive bacterial proteins, the success scores achieved by the new predictor are overwhelmingly better according to the metrics widely used to measure the quality of multi-label predictors
Summary
As the most basic unit of life, a cell must undergo three most important processes of any living things: growth, reproduction, and death [1]. With more experimental data emerging, the localization of proteins in a cell is a multi-label system, where some proteins may simultaneously occur in two or more different location sites. This kind of multiplex proteins often bears some exceptional biological functions [4,5,6], and should deserve our special attention [7,8,9,10,11,12], from the viewpoint of selecting multiple targets [13,14,15] or key targets [16,17,18,19] for drug development
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.