HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold.

Inti Anabela Pagnuco,Arjen Ten Have,Hernán Gabriel Bondino,Marcel Brun,María Victoria Revuelta,Andy T Y Lau

doi:10.1371/journal.pone.0193757

Abstract

BackgroundProtein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific.ResultsHMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained.ConclusionsHMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER.

Highlights

Protein sequence function annotation is one of the major tasks of computational genomics
Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off
It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity

Summary

Introduction

Protein sequence function annotation is one of the major tasks of computational genomics. Paralogs can obtain novel functions due to relaxed functional constraints, often while maintaining their original function All together this results in intricate superfamilies where function annotation by similarity scoring is hampered by problems of sensitivity and specificity combined with imperfect annotation of reference sequences. This problem increases when taking into account the fact that, in the post genome era, biologists want to obtain annotations at the subfamily level, rather than the superfamily level. There are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle This is hampered by the lack of a score cut-off that is both sensitive and specific

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Mar 26, 2018
Citations: 18	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Searching, selecting, and synthesizing source code components
...
-
, et. al. ...
01 Jan 2012
01 Jan 2012

Semi-supervised image classification in large datasets by using random forest and fuzzy quantification of the salient object
Hager Merdassi ... Ezzeddine Zagrouba
-
Hager Merdassi, et. al.Hager Merdassi ... Ezzeddine Zagrouba
01 Nov 2014
01 Nov 2014

Application of distributed SVM architectures in classifying forest data cover types
Mira Trebar ... Nigel Steele
Computers and Electronics in Agriculture | VOL. 63
Mira Trebar, et. al.Mira Trebar ... Nigel Steele
24 Mar 2008
Computers and Electronics in Agriculture | VOL. 63

Incremental classification of process data for anomaly detection based on similarity analysis
Stefan Byttner ... Gancho Vachkov
-
Stefan Byttner, et. al.Stefan Byttner ... Gancho Vachkov
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE