ENZYMAP: Exploiting Protein Annotation for Modeling and Predicting EC Number Changes in UniProt/Swiss-Prot

Sabrina De Azevedo Silveira,Marcelo Matos Santoro,Raquel Cardoso De Melo-Minardi,Wagner Meira Jr,Carlos Henrique Da Silveira

doi:10.1371/journal.pone.0089162

Sabrina De Azevedo Silveira, Marcelo Matos Santoro + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0089162

Copy DOI

Abstract

The volume and diversity of biological data are increasing at very high rates. Vast amounts of protein sequences and structures, protein and genetic interactions and phenotype studies have been produced. The majority of data generated by high-throughput devices is automatically annotated because manually annotating them is not possible. Thus, efficient and precise automatic annotation methods are required to ensure the quality and reliability of both the biological data and associated annotations. We proposed ENZYMatic Annotation Predictor (ENZYMAP), a technique to characterize and predict EC number changes based on annotations from UniProt/Swiss-Prot using a supervised learning approach. We evaluated ENZYMAP experimentally, using test data sets from both UniProt/Swiss-Prot and UniProt/TrEMBL, and showed that predicting EC changes using selected types of annotation is possible. Finally, we compared ENZYMAP and DETECT with respect to their predictions and checked both against the UniProt/Swiss-Prot annotations. ENZYMAP was shown to be more accurate than DETECT, coming closer to the actual changes in UniProt/Swiss-Prot. Our proposal is intended to be an automatic complementary method (that can be used together with other techniques like the ones based on protein sequence and structure) that helps to improve the quality and reliability of enzyme annotations over time, suggesting possible corrections, anticipating annotation changes and propagating the implicit knowledge for the whole dataset.

Highlights

In recent decades there has been a surge in the amount of biological data available
Descriptive Multiclass Experiment we present the results of the descriptive step. This experiment aimed to verify whether the line types Organism Classification (OC), Reference Position (RP) and KW are able to discriminate entries that experienced a specific change in their EC number from those that remained the same
We proposed ENZYMatic Annotation Predictor (ENZYMAP), a technique based on supervised learning to characterize and predict annotation changes in temporal data from UniProt/Swiss-Prot using entry line types that are already available in the database

Summary

Introduction

In recent decades there has been a surge in the amount of biological data available. According to [1], new DNA sequencing technologies allowed a 1000-fold drop in sequencing costs since 1990 and made an increasing number of large data collection projects economically possible, leading to an exponential increase in the DNA sequence data available. Glycoprotein G of the Nipah virus (entry Q9IH62 in Swiss-Prot) illustrates the drawbacks of this approach. Up to release 14 (July 2008) of UniProt/Swiss-Prot [9], entry Q9IH62 was considered an enzyme. Despite all these sequence and structural similarities, Henipavirus Glycoproteins G are known not to be enzymes and to have only hemagglutinin activity, performing protein-protein interactions with host receptors [7]. The scientific community still has concerns regarding the quality and reliability of the data and annotations from the large, publicly available databases

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Feb 19, 2014
Citations: 29	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ENZYMAP: Exploiting Protein Annotation for Modeling and Predicting EC Number Changes in UniProt/Swiss-Prot

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

A novel method for automatic functional annotation of proteins.
W Fleischmann ... A Gateau
Bioinformatics | VOL. 15
W Fleischmann, et. al.W Fleischmann ... A Gateau
01 Mar 1999
Bioinformatics | VOL. 15

Quantitative Genetic Interactions Reveal Biological Modularity
Pedro Beltrao ... Nevan J Krogan
Cell | VOL. 141
Pedro Beltrao, et. al.Pedro Beltrao ... Nevan J Krogan
01 May 2010
Cell | VOL. 141

Reducing False-Positive Prediction of Minimotifs with a Genetic Interaction Filter
Jerlin C Merlin ... Tian Mi
PLoS ONE | VOL. 7
Jerlin C Merlin, et. al.Jerlin C Merlin ... Tian Mi
05 Mar 2012
PLoS ONE | VOL. 7

IMGT/Automat: the strategy for the annotation of human and mouse cDNA nucleotide sequences of IG and TR
Géraldine Folch ... Laetitia Regnier
Nature Precedings | VOL. 4
Géraldine Folch, et. al.Géraldine Folch ... Laetitia Regnier
23 Apr 2009
Nature Precedings | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ENZYMAP: Exploiting Protein Annotation for Modeling and Predicting EC Number Changes in UniProt/Swiss-Prot

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE