InteMAP: Integrated metagenomic assembly pipeline for NGS short reads.

Binbin Lai,Xiaoqi Wang,Huaiqiu Zhu,Fumeng Wang,Liping Duan

doi:10.1186/s12859-015-0686-x

Binbin Lai, Xiaoqi Wang + Show 3 more

Open Access

https://doi.org/10.1186/s12859-015-0686-x

Copy DOI

Abstract

BackgroundNext-generation sequencing (NGS) has greatly facilitated metagenomic analysis but also raised new challenges for metagenomic DNA sequence assembly, owing to its high-throughput nature and extremely short reads generated by sequencers such as Illumina. To date, how to generate a high-quality draft assembly for metagenomic sequencing projects has not been fully addressed.ResultsWe conducted a comprehensive assessment on state-of-the-art de novo assemblers and revealed that the performance of each assembler depends critically on the sequencing depth. To address this problem, we developed a pipeline named InteMAP to integrate three assemblers, ABySS, IDBA-UD and CABOG, which were found to complement each other in assembling metagenomic sequences. Making a decision of which assembling approaches to use according to the sequencing coverage estimation algorithm for each short read, the pipeline presents an automatic platform suitable to assemble real metagenomic NGS data with uneven coverage distribution of sequencing depth. By comparing the performance of InteMAP with current assemblers on both synthetic and real NGS metagenomic data, we demonstrated that InteMAP achieves better performance with a longer total contig length and higher contiguity, and contains more genes than others.ConclusionsWe developed a de novo pipeline, named InteMAP, that integrates existing tools for metagenomics assembly. The pipeline outperforms previous assembly methods on metagenomic assembly by providing a longer total contig length, a higher contiguity and covering more genes. InteMAP, therefore, could potentially be a useful tool for the research community of metagenomics.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0686-x) contains supplementary material, which is available to authorized users.

Highlights

Next-generation sequencing (NGS) has greatly facilitated metagenomic analysis and raised new challenges for metagenomic DNA sequence assembly, owing to its high-throughput nature and extremely short reads generated by sequencers such as Illumina
The challenges come from the situation of highthroughput and extremely short reads generated by such as Illumina sequencers, as well as from intrinsic complications of metagenomic data caused by the microbial communities
In contrast to previous metagenomic assembly evaluation efforts that deemed the community as a whole [29, 30], we evaluated the assemblers by investigating how they perform on individual species in terms of factors such as sequencing depth of coverage and/or genomic similarity within the microbial community

Summary

Introduction

Next-generation sequencing (NGS) has greatly facilitated metagenomic analysis and raised new challenges for metagenomic DNA sequence assembly, owing to its high-throughput nature and extremely short reads generated by sequencers such as Illumina. Without the need for prior laboratory cultivation, metagenomics, as the study of sequence data directly from microbial communities in their natural habitats, has shown great power in investigating ubiquitous microorganisms that have intimate relationships with human beings as well as all other living organisms [1,2,3]. Despite of early endeavours to assemble reads from Sanger and 454 sequencers, such as Genovo [9], Xgeovo [10] and MAP [11], high-throughput NGS short reads raise new challenges for this problem. The challenges come from the situation of highthroughput and extremely short reads generated by such as Illumina sequencers, as well as from intrinsic complications of metagenomic data caused by the microbial communities.

Methods

Results

Conclusion