Abstract

As one of the essential topics in proteomics and molecular biology, protein subcellular localization has been extensively studied in previous decades. However, most of the methods are limited to the prediction of single-location proteins. In many studies, multi-location proteins are either not considered or assumed not existing. This paper proposes a novel multi-label subcellular-localization predictor based on the semantic similarity between Gene Ontology (GO) terms. Given a protein, the accession numbers of its homologs are obtained via BLAST search. Then, the homologous accession numbers of the protein are used as keys to search against the gene ontology annotation database to obtain a set of GO terms. The semantic similarity between GO terms is used to formulate semantic similarity vectors for classification. A support vector machine (SVM) classifier with a new decision scheme is proposed to classify the multi-label GO semantic similarity vectors. Experimental results show that the proposed multi-label predictor significantly outperforms the state-of-the-art predictors such as iLoc-Plant and Plant-mPLoc.

Highlights

  • In recent years, protein subcellular localization has gained tremendous attention due to its important roles in elucidating protein functions, identifying drug targets, and so on [1]

  • The plant dataset used in Plant-mPLoc [14], iLoc-Plant [16] and mGOASVM [27]4 were used to evaluate the performance of the proposed predictor

  • This paper proposes a new multi-label predictor based on Gene Ontology semantic similarity to predict the subcellular locations of multi-label proteins

Read more

Summary

Introduction

Protein subcellular localization has gained tremendous attention due to its important roles in elucidating protein functions, identifying drug targets, and so on [1]. Several multi-label predictors have been proposed, including Plant-mPLoc [14], Virus-mPLoc [15], iLoc-Plant [16] and iLoc-Virus [17] These predictors use the GO information and have demonstrated superiority over other methods. The semantic similarity over Gene Ontology has been extensively studied and have been applied in many biological problems, including protein function prediction [18], subnuclear localization prediction [19], protein-protein interaction inference [20] and microarray clustering [21] The performance of these predictors depends on whether the similarity measure is relevant to the biological problems. This paper proposes a novel predictor based on the GO semantic similarity for multi-label protein subcellular localization prediction. Results on a recent benchmark dataset demonstrate that these three properties enable the proposed predictor to accurately predict multi-location proteins and outperform three state-of-the-art predictors

Retrieval of GO Terms
Semantic Similarity Measure
Multi-Label Multi-Class SVM Classification
Dataset and Performance Metrics
Comparing with State-of-the-Art Predictors
Conclusions and Future Works
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.