XHMMER3x2: Utilizing HMMER3's speed and HMMER2's sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation.

Choon-Kong Yap,Birgit Eisenhaber,Wing-Cheong Wong,Frank Eisenhaber

doi:10.1186/s13062-016-0163-0

Choon-Kong Yap, Birgit Eisenhaber + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/s13062-016-0163-0

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundWhile the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. In addition, the incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis.ResultsIn this work, both the speed of HMMER3 and glocal-mode alignment of HMMER2 are combined within the xHMMER3x2 framework for tackling the large-scale domain annotation task. Briefly, HMMER3 is utilized for initial domain detection so that HMMER2 can subsequently perform the glocal-mode, sequence-to-full-domain alignments for the detected HMMER3 hits. An E-value calibration procedure is required to ensure that the search space by HMMER2 is sufficiently replicated by HMMER3. We find that the latter is straightforwardly possible for ~80% of the models in the Pfam domain library (release 29). However in the case of the remaining ~20% of HMMER3 domain models, the respective HMMER2 counterparts are more sensitive. Thus, HMMER3 searches alone are insufficient to ensure sensitivity and a HMMER2-based search needs to be initiated. When tested on the set of UniProt human sequences, xHMMER3x2 can be configured to be between 7× and 201× faster than HMMER2, but with descending domain detection sensitivity from 99.8 to 95.7% with respect to HMMER2 alone; HMMER3’s sensitivity was 95.7%. At extremes, xHMMER3x2 is either the slow glocal-mode HMMER2 or the fast HMMER3 with glocal-mode. Finally, the E-values to false-positive rates (FPR) mapping by xHMMER3x2 allows E-values of different model builds to be compared, so that any annotation discrepancies in a large-scale annotation exercise can be flagged for further examination by dissectHMMER.ConclusionThe xHMMER3x2 workflow allows large-scale domain annotation speed to be drastically improved over HMMER2 without compromising for domain-detection with regard to sensitivity and sequence-to-domain alignment incompleteness. The xHMMER3x2 code and its webserver (for Pfam release 27, 28 and 29) are freely available at http://xhmmer3x2.bii.a-star.edu.sg/.ReviewersReviewed by Thomas Dandekar, L. Aravind, Oliviero Carugo and Shamil Sunyaev. For the full reviews, please go to the Reviewers’ comments section.Electronic supplementary materialThe online version of this article (doi:10.1186/s13062-016-0163-0) contains supplementary material, which is available to authorized users.

Highlights

While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocalmode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments
Post-calibration finding 1: On average, HMMER3 E-values need to be more stringent than HMMER2 glocal-mode E-values to exhibit the same false-positive rates (FPR) The function annotation task via the HMMER algorithm is innately coupled to the domain libraries
This entails the objective comparison of the receiver operating characteristic (ROC) curves generated by the two HMMER builds (i.e., HMMER2 and HMMER3) for each domain model

Summary

Introduction

While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocalmode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. The incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis. The value of biomedical and biotechnological applications from biomolecular sequence information is generally limited by the degree of functional annotation of non-coding genomic regions, protein-coding genes and the proteins themselves [1, 2]. Functional annotation is a challenging and non-trivial task [9,10,11,12,13,14,15]

Methods

Results

Conclusion