HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors

Yuan Zhang,Yanni Sun

doi:10.1186/1471-2105-12-198

Abstract

BackgroundProtein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors.ResultsWe introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families.ConclusionsHMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at http://www.cse.msu.edu/~zhangy72/hmmframe/ and at https://sourceforge.net/projects/hmm-frame/.

Highlights

Protein domain classification is an important step in metagenomic annotation
Metagenomic annotation focuses on phylogenetic complexity and protein composition analysis
HMMFRAME differs from HMMER in the following ways: 1) HMM-FRAME directly accepts a DNA sequence as input, 2) HMM-FRAME accepts a sequencing error model as input, 3) HMM-FRAME can detect and fix frameshifts caused by sequencing errors in the DNA sequence

Summary

Introduction

Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. The relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. An important component in protein composition analysis is protein domain classification, which classifies a putative protein sequence into annotated domain families and aids in functional analysis. Profile HMM-based alignment is the state-of-the-art method for protein domain classification because of its high sensitivity in classifying remote homologs [1]. The latest version of HMMER can achieve comparable speed to BLAST, making it applicable to large-scale metagenomic data sets

Objectives

Methods

Results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 24, 2011
Citations: 88	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A Sensitive and Accurate protein domain cLassification Tool (SALT) for short reads
Yuan Zhang ... Yanni Sun
Bioinformatics | VOL. 29
Yuan Zhang, et. al.Yuan Zhang ... Yanni Sun
19 Jun 2013
Bioinformatics | VOL. 29

EVEREST: automatic identification and classification of protein domains in all protein sequences.
Elon Portugaly ... Nathan Linial
BMC bioinformatics | VOL. 7
Elon Portugaly, et. al.Elon Portugaly ... Nathan Linial
02 Jun 2006
BMC bioinformatics | VOL. 7

The Classification of Protein Domains
Russell L Marsden ... Christine A Orengo
-
Russell L Marsden, et. al.Russell L Marsden ... Christine A Orengo
01 Jan 2008
01 Jan 2008

The Classification of Protein Domains.
Natalie Dawson ... Christine A Orengo
Methods in molecular biology (Clifton, N.J.) | VOL. 1525
Natalie Dawson, et. al.Natalie Dawson ... Christine A Orengo
29 Nov 2016
Methods in molecular biology (Clifton, N.J.) | VOL. 1525

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics