Abstract

PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the application and performance of PacBio long reads and Illumina HiSeq short reads in metagenomic analyses, we directly compared various assemblies involving PacBio and Illumina sequencing reads based on two anaerobic digestion microbiome samples from a biogas fermenter. Using a PacBio platform, 1.58 million long reads (19.6 Gb) were produced with an average length of 7,604 bp. Using an Illumina HiSeq platform, 151.2 million read pairs (45.4 Gb) were produced. Hybrid assemblies using PacBio long reads and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length, contig N50 size, and number of large contigs. Interestingly, depth-based hybrid assemblies generated a higher percentage of complete genes (98.86%) compared to those based on HiSeq contigs only (40.29%), because the PacBio reads were long enough to cover many repeating short elements and capture multiple genes in a single read. Additionally, the incorporation of PacBio long reads led to considerable advantages regarding reducing contig numbers and increasing the completeness of the genome reconstruction, which was poorly assembled and binned when using HiSeq data alone. From this comparison of PacBio long reads with Illumina HiSeq short reads related to complex microbiome samples, we conclude that PacBio long reads can produce longer contigs, more complete genes, and better genome binning, thereby offering more information about metagenomic samples.

Highlights

  • Metagenome sequencing represents a powerful approach that allows identification of unknown, non-culturable microbial organisms, discovery of unknown functional genes, and insights into functional processes of specific ecosystems (Parks et al, 2017; Xia et al, 2018; Hua et al, 2019)

  • Previous research indicates that the use of PacBio consensus sequences (CCS) provides high-quality long reads that are suitable for metagenomic applications, e.g., hybrid assembly using PacBio CCS and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length and number of large contigs (Frank et al, 2016)

  • We evaluated a short-read metagenomic assembler (MEGAHIT), two hybrid metagenomic assemblers (DBG2OLC and OPERA-MS) and depth-based hybrid assembly based on metagenomic data obtained using two sequencing approaches (Illumina HiSeq and PacBio)

Read more

Summary

Introduction

Metagenome sequencing represents a powerful approach that allows identification of unknown, non-culturable microbial organisms, discovery of unknown functional genes, and insights into functional processes of specific ecosystems (Parks et al, 2017; Xia et al, 2018; Hua et al, 2019). Previous research indicates that the use of PacBio CCS provides high-quality long reads that are suitable for metagenomic applications, e.g., hybrid assembly using PacBio CCS and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length and number of large contigs (Frank et al, 2016). Another study demonstrates that it is possible to de novo assemble finished metagenome-assembled genomes (MAG) from low-complexity metagenome samples using third generation sequencing data (Somerville et al, 2019). Those studies focus on assemblies and MAG using single assembly method, but few of those focus on gene catalogs and multiple assembly methods

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call