Abstract
BackgroundMetagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs.ResultsTo facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets.ConclusionsA novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.
Highlights
Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via NextGeneration Sequencing (NGS)
Software parameters The performance of ContigExtender on simulated and real datasets is benchmarked against the existing contig extension tools PRICE [32], Kollector [48] and GenSeedHMM [49]
From randomly chosen 1 kb seed contigs, ContigExtender was able to reconstruct nearly perfect genomes for all three viral genomes in all cases except for two challenging situations: (1) low sequencing depth 10 × coupled with short reads (100 bp) and (2) low depth 10 × coupled with high error rates (0.05)
Summary
Software parameters The performance of ContigExtender on simulated and real datasets is benchmarked against the existing contig extension tools PRICE [32], Kollector [48] and GenSeedHMM [49]. ContigExtender generally performed better than PRICE in low depth (10×) and high error rate datasets (Table 2 and Additional file 1: Table S1) Both reconstructed nearly the entire reference genome when given higher depth sequencing data. By using alignment rather than the kmer search utilized in most de novo assemblers, ContigExtender trades speed for accuracy, allowing for better performance in high sequencing error regions. These features may explain the favorable performance of ContigExtender relative to other contig extension tools. GenSeed-HMM, in a similar process to ContigExtender, iteratively finds similar reads and extends contigs through assembly software These tools have a common element in that they all utilize de Bruijn assemblers to generate a consensus sequence. We speculate that our current version may not work well for other genomes for two reasons: 1) Viral genomes contain considerably fewer repeats than other genomes; and 2) the sequencing dataset sizes for non-viral genomes are usually considerably larger, so the running time may require further optimization
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.