Abstract

Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.

Highlights

  • Proteins located in appropriate physiological contexts within a cell are of paramount importance to exert their biological functions

  • Compared to existing multi-label subcellular-localization predictors, our proposed predictor has the following advantages: (1) it formulates the feature vectors by hybridizing Gene Ontology (GO) frequency of occurrences and GO semantic similarity features which contain richer information than only GO term frequencies; (2) it adopts a new strategy to incorporate richer and more useful homologous information from more distant homologs rather than using the top homologs only; (3) it adopts an adaptive decision strategy for multi-label support vector machine (SVM) classifiers so that it can effectively deal with datasets containing both single-label and multi-label proteins

  • This paper proposes a new multi-label predictor by hybridizing GO frequency features and semantic similarity features to predict the subcellular locations of multi-label proteins

Read more

Summary

Introduction

Proteins located in appropriate physiological contexts within a cell are of paramount importance to exert their biological functions. Subcellular localization of proteins is essential to the functions of proteins and has been suggested as a means to maximize functional diversity and economize on protein design and synthesis [1]. Aberrant protein subcellular localization is closely correlated to a broad range of human diseases, such as Alzheimer’s disease [2], kidney stone [3], primary human liver tumors [4], breast cancer [5], pre-eclampsia [6] and Bartter syndrome [7]. Wet-lab experiments such as fluorescent microscopy imaging, cell fractionation and electron microscopy are the gold standard for validating subcellular localization and are essential for the design of high quality localization databases such as The Human Protein Atlas (http://www.proteinatlas.org/). With the avalanche of newly discovered protein sequences in the post-genomic era, computational methods are required to assist biologists to deal with large-scale proteomic data to determine the subcellular localization of proteins

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.