Abstract

BackgroundRecent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs. However, the average read lengths in next-generation sequencing technologies are short as compared with that of traditional Sanger sequencing. The short sequence reads pose great challenges for de novo sequence assembly. As a pilot project for whole genome sequencing of the catfish genome, here we attempt to determine the proper sequence coverage, the proper software for assembly, and various parameters used for the assembly of a BAC physical map contig spanning approximately a million of base pairs.ResultsA combination of low sequence coverage of 454 and Illumina sequencing appeared to provide effective assembly as reflected by a high N50 value. Using 454 sequencing alone, a sequencing depth of 18 X was sufficient to obtain the good quality assembly, whereas a 70 X Illumina appeared to be sufficient for a good quality assembly. Additional sequencing coverage after 18 X of 454 or after 70 X of Illumina sequencing does not provide significant improvement of the assembly. Considering the cost of sequencing, a 2 X 454 sequencing, when coupled to 70 X Illumina sequencing, provided an assembly of reasonably good quality. With several software tested, Newbler with a seed length of 16 and ABySS with a K-value of 60 appear to be appropriate for the assembly of 454 reads alone and Illumina paired-end reads alone, respectively. Using both 454 and Illumina paired-end reads, a hybrid assembly strategy using Newbler for initial 454 sequence assembly, Velvet for initial Illumina sequence assembly, followed by a second step assembly using MIRA provided the best assembly of the physical map contig, resulting in 193 contigs with a N50 value of 13,123 bp.ConclusionsA hybrid sequencing strategy using low sequencing depth of 454 and high sequencing depth of Illumina provided the good quality assembly with high N50 value and relatively low cost. A combination of Newbler, Velvet, and MIRA can be used to assemble the 454 sequence reads and the Illumina reads effectively. The assembled sequence can serve as a resource for comparative genome analysis. Additional long reads using the third generation sequencing platforms are needed to sequence through repetitive genome regions that should further enhance the sequence assembly.

Highlights

  • Recent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs

  • A number of genomic tools and resources have been developed in catfish, including bacterial artificial chromosome (BAC) libraries [9,10], BAC-based physical maps [11,12], genetic linkage maps [13,14,15], a large number of ESTs [2,16], over 1700 unique full length cDNAs [17], over 60,000 BAC end sequences [3,7], and a large number of identified molecular markers such as microsatellites and single nucleotide polymorphisms [2,18]

  • Generation of short sequencing reads from pooled catfish BAC clones To assess the de novo assembly strategy for the catfish whole genome sequencing project, twenty-four BAC clones were selected from the largest contig of the BAC-based catfish physical map

Read more

Summary

Introduction

Recent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs. A common feature of these sequencers is their relatively short sequencing reads, making subsequent sequence assembly a great challenge Such challenges become even more significant when dealing with large and complex eukaryotic genomes. Known to have gone through a third round of whole genome duplication [19], poses additional challenge when coupled with the short sequencing reads. In consideration of such complexities, Quinn et al [20] conducted a pilot study with eight pooled BAC clones covering approximately 1 Mb of the Atlantic salmon genome with 454 GS FLX pyrosequencing, and concluded that it was difficult to achieve good levels of genome sequence assembly with 454 sequencing with the tetraploid genome. In addition with the existing BAC end sequences, we aim to take full advantages of multiple sequencing technologies and existing genetic resources for the upcoming catfish whole genome sequencing

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call