Abstract

Motivation: To assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences.Results: Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as an additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies.Availability and implementation: All assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.Contact: brownsd@ornl.govSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • The development and evolution of next-generation sequencing (NGS) platforms has dramatically changed biological studies in recent years (Mavromatis et al, 2012)

  • Pacific Biosciences (PacBio) sequencing data were generated at the Genome Sequencing and Analysis Core Resource at Duke University using the PacBio RS-I instrument, C2 chemistry and one SMRT cell per genome

  • Published draft genome assemblies generated from Illumina PE reads (Brown et al, 2012b) were improved using combined data from the different sequencing platforms and hybrid assembly protocols

Read more

Summary

Introduction

The development and evolution of next-generation sequencing (NGS) platforms has dramatically changed biological studies in recent years (Mavromatis et al, 2012). Most sequenced genomes are incomplete owing to technical difficulties, time and the expense leading to an increasing disparity in quality and usefulness between finished and draft genomes in databases (Chain et al, 2009). Because of their low cost, accuracy and high throughput, Illumina platforms have dominated the sequencing industry (Mavromatis et al, 2012). The so-called ‘third generation’ single-molecule sequencing technology developed by Pacific Biosciences (PacBio) has been compared with several NGS platforms (Quail et al, 2012). Read lengths up to 14 kb have been reported for PacBio RS I chemistry (Nagarajan and Pop, 2013) and nearly 27 kb for RS II chemistry (Brown et al, 2014)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call