Abstract

BackgroundA crucial task in metagenomic analysis is to annotate the function and taxonomy of the sequencing reads generated from a microbiome sample. In general, the reads can either be assembled into contigs and searched against reference databases, or individually searched without assembly. The first approach may suffer from fragmentary and incomplete assembly, while the second is hampered by the reduced functional signal contained in the short reads. To tackle these issues, we have previously developed GRASP (Guided Reference-based Assembly of Short Peptides), which accepts a reference protein sequence as input and aims to assemble its homologs from a database containing fragmentary protein sequences. In addition to a gene-centric assembly tool, GRASP also serves as a homolog search tool when using the assembled protein sequences as templates to recruit reads. GRASP has significantly improved recall rate (60–80% vs. 30–40%) compared to other homolog search tools such as BLAST. However, GRASP is both time- and space-consuming. Subsequently, we developed GRASPx, which is 30X faster than GRASP. Here, we present a completely redesigned algorithm, GRASP2, for this computational problem.ResultsGRASP2 utilizes Burrows-Wheeler Transformation (BWT) and FM-index to perform assembly graph generation, and reduces the search space by employing a fast ungapped alignment strategy as a filter. GRASP2 also explicitly generates candidate paths prior to alignment, which effectively uncouples the iterative access of the assembly graph and alignment matrix. This strategy makes the execution of the program more efficient under current computer architecture, and contributes to GRASP2’s speedup.GRASP2 is 8-fold faster than GRASPx (and 250-fold faster than GRASP) and uses 8-fold less memory while maintaining the original high recall rate of GRASP. GRASP2 reaches ~ 80% recall rate compared to that of ~ 40% generated by BLAST, both at a high precision level (> 95%). With such a high performance, GRASP2 is only ~3X slower than BLASTP.ConclusionGRASP2 is a high-performance gene-centric and homolog search tool with significant speedup compared to its predecessors, which makes GRASP2 a useful tool for metagenomics data analysis, GRASP2 is implemented in C++ and is freely available from http://www.sourceforge.net/projects/grasp2.

Highlights

  • Metagenomics is a culture-independent approach for studying the genomic content of a given microbial community

  • The resulting software GRASP2 has great application potential for its high performance and significantly improved computational efficiency

  • The extension links are built by a linear scan of the suffix array constructed from the entire read set to identify intervals that share a common prefix of length longer than l

Read more

Summary

Introduction

Metagenomics is a culture-independent approach for studying the genomic content of a given microbial community. Assembly-independent analysis methods directly annotate individual reads by searching for them against available databases These databases may contain fully sequenced genomes, proteins and protein domains, as well as marker genes with annotated taxonomy. Significant hits against the databases suggest homology between the reads and sequences in databases, allowing us to infer the function and taxonomy of the individual reads and subsequently predict the function of the entire community This approach relies heavily on the completeness of the existing databases. Such databases are rarely available, except for simple and well-studied communities, with microbial diversity in most environments not being sufficiently well-characterized or sequenced In this case, most database searches involve moderate- or remote-homology detections, which are more challenging compared to close-homology detection. We present a completely redesigned algorithm, GRASP2, for this computational problem

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call