HMM-ModE: implementation, benchmarking and validation with HMMER3.

Swati Sinha,Andrew Lynn

doi:10.1186/1756-0500-7-483

Abstract

BackgroundHMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation.ResultsThe implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy.ConclusionsThe present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families.

Highlights

Hidden Markov Model (HMM)-ModE is a computational method that generates family specific profile HMMs using negative training sequences
Benchmarking of the method using HMMER2 and HMMER3 An immediate concern in the implementation of the HMM-ModE protocol with HMMER3 is that this version has only local-local alignments
HMM-ModE can improve signals normally associated with substrate specificity which are differentially conserved in protein superfamilies, and should implicitly benefit from global or “glocal” alignments

Summary

Introduction

HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. HMM-ModE [9] is a method that generates family specific profile HMMs, through HMMER, by optimizing the discrimination threshold using the mode of average MCC (Mathews correlation coefficient) distribution from 10-fold cross validation and modifying the emission probabilities using negative training sequences. The protocol is much faster in training because only the sequences selected as false positives by the subfamily HMM, are used to modify model parameters and optimize the discrimination threshold It provides a significant improvement over the existing methods for classification of fold and function specific signals. We have compared all the results reported in this manuscript with the earlier version of the method

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC research notes	Publication Date: Jan 1, 2014
Citations: 35	License type: cc-by

R Discovery Prime

R Discovery Prime

HMM-ModE: implementation, benchmarking and validation with HMMER3.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC research notes

Lead the way for us

Similar Papers

ProFAT: a web-based tool for the functional annotation of protein sequences.
Charles Richard Bradshaw ... Vineeth Surendranath
BMC bioinformatics | VOL. 7
Charles Richard Bradshaw, et. al.Charles Richard Bradshaw ... Vineeth Surendranath
23 Oct 2006
BMC bioinformatics | VOL. 7

Supervised Learning-Aided Optimization of Expert-Driven Functional Protein Sequence Annotation
Lev Soinov ... Alexander Kanapin
-
Lev Soinov, et. al.Lev Soinov ... Alexander Kanapin
01 Jan 2004
01 Jan 2004

Automated genome sequence analysis and annotation.
M A Andrade ... C Sander
Computer applications in the biosciences : CABIOS | VOL. 15
M A Andrade, et. al.M A Andrade ... C Sander
01 May 1999
Computer applications in the biosciences : CABIOS | VOL. 15

Cross-Genome Comparisons of Newly Identified Domains inMycoplasma gallisepticumand Domain Architectures with OtherMycoplasmaspecies
Chandra Sekhar Reddy Chilamakuri ... Sane Sudha Rani
International journal of genomics | VOL. 2011
Chandra Sekhar Reddy Chilamakuri, et. al.Chandra Sekhar Reddy Chilamakuri ... Sane Sudha Rani
01 Jan 2010
International journal of genomics | VOL. 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HMM-ModE: implementation, benchmarking and validation with HMMER3.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC research notes