Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.

Hafida Bouziane,Abdallah Chouarfia

doi:10.1515/jib-2019-0091

Abstract

To date, many proteins generated by large-scale genome sequencing projects are still uncharacterized and subject to intensive investigations by both experimental and computational means. Knowledge of protein subcellular localization (SCL) is of key importance for protein function elucidation. However, it remains a challenging task, especially for multiple sites proteins known to shuttle between cell compartments to perform their proper biological functions and proteins which do not have significant homology to proteins of known subcellular locations. Due to their low-cost and reasonable accuracy, machine learning-based methods have gained much attention in this context with the availability of a plethora of biological databases and annotated proteins for analysis and benchmarking. Various predictive models have been proposed to tackle the SCL problem, using different protein sequence features pertaining to the subcellular localization, however, the overwhelming majority of them focuses on single localization and cover very limited cellular locations. The prediction was basically established on sorting signals, amino acids compositions, and homology. To improve the prediction quality, focus is actually on knowledge information extracted from annotation databases, such as protein–protein interactions and Gene Ontology (GO) functional domains annotation which has been recently a widely adopted and essential information for learning systems. To deal with such problem, in the present study, we considered SCL prediction task as a multi-label learning problem and tried to label both single site and multiple sites unannotated bacterial protein sequences by mining proteins homology relationships using both GO terms of protein homologs and PSI-BLAST profiles. The experiments using 5-fold cross-validation tests on the benchmark datasets showed a significant improvement on the results obtained by the proposed consensus multi-label prediction model which discriminates six compartments for Gram-negative and five compartments for Gram-positive bacterial proteins.

Highlights

Proteins are key players in cell survival and damage and their presence in specific cell sites reflects the nature of their biological function
They were specialized for specific organisms and certain localization sites, but no significant improvements over the k-Nearest Neighbor (k-NN) algorithm were reported until the burt of the new generation methods based on hybrid models and fusion approach [26,27,28,29,30,31,32,33] taking into account both protein sequence and structure charachteristics
First we extracted from the learning datasets three sets of distinct Gene Ontology (GO) terms which are the top ranked GO terms provided by PANNZER2, corresponding to the three sub-ontology molecular function (MF), biological process (BP), and cellular component (CC) by removing the repetitive GO terms

Summary

Introduction

Proteins are key players in cell survival and damage and their presence in specific cell sites reflects the nature of their biological function. Many systems using a variety of machine learning techniques have been proposed achieving varying degrees of success They were specialized for specific organisms and certain localization sites, but no significant improvements over the k-NN algorithm were reported until the burt of the new generation methods based on hybrid models and fusion approach [26,27,28,29,30,31,32,33] taking into account both protein sequence and structure charachteristics. They are categorized as sorting signals-based, composition-based and homology-based methods.

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of integrative bioinformatics	Publication Date: Jun 29, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of integrative bioinformatics

Lead the way for us

Similar Papers

MGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
Shibiao Wan ... Man-Wai Mak
BMC Bioinformatics | VOL. 13
Shibiao Wan, et. al.Shibiao Wan ... Man-Wai Mak
06 Nov 2012
BMC Bioinformatics | VOL. 13

Global analysis of gene function in yeast by quantitative phenotypic profiling
James A Brown ... Gavin Sherlock
Molecular Systems Biology | VOL. 2
James A Brown, et. al.James A Brown ... Gavin Sherlock
01 Jan 2006
Molecular Systems Biology | VOL. 2

BC4GO: a full-text corpus for the BioCreative IV GO task.
K Van Auken ... D Li
Database | VOL. 2014
K Van Auken, et. al.K Van Auken ... D Li
28 Jul 2014
Database | VOL. 2014

Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae
Shaowu Meng ... Douglas E Brown
BMC Microbiology | VOL. 9
Shaowu Meng, et. al.Shaowu Meng ... Douglas E Brown
01 Feb 2009
BMC Microbiology | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of integrative bioinformatics