Abstract

BackgroundThe relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.ResultsWe found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.ConclusionsIn conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads.

Highlights

  • The availability of genome sequence information can greatly aid and underpin the biological research of a given species

  • These performance parameters included: the contamination of small insert reads (,500 bp) originating either from un-digested linear DNA fragments or from fragments with damaged sites within circularized molecules that were labeled by biotin-dNTPs; the introduction of chimerically ligated DNA fragments during DNA circularization; the introduction of PCR duplicates due to the reduced library complexity, and the library complexity that could be measured by the final number of PE molecules with distinct origins that have proper insert sizes and orientation relationships when mapped to the human genome

  • We generated 7–10 million PE reads of high quality data on Illumina sequencing platforms, which presents a sufficient physical coverage of at least 9-fold over the human genome for the performance analyses

Read more

Summary

Introduction

The availability of genome sequence information can greatly aid and underpin the biological research of a given species. This is mainly due to the prohibitive cost required for de novo sequencing and assembly of large, complex genomes using traditional Sanger sequencing. Efforts to de novo assembling NGS short reads, especially for mammalian genomes that include complex repeat sequences, have been greatly limited by the read-length [8,9,10,11]. One potential solution to this issue is to perform hierarchical assembly using paired-end (PE) sequence from different classes of long-range DNA fragments. The relatively short read lengths from generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (.1 kb). We characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call