Abstract

BackgroundUsually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging.ResultsWe have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing.ConclusionsOur method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-14-711) contains supplementary material, which is available to authorized users.

Highlights

  • Generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing

  • A major advance in next generation sequencing (NGS) is the development of paired-end (PE) library construction, which generates two short reads from a single DNA fragment separated by an insert of a known size

  • Several attempts have been made to extend the length of short reads by merging the paired-end reads from small fragments into longer single end reads [5,6,7] and proved the advantages of longer reads in metagenomics and genome assembly

Read more

Summary

Introduction

Generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. A major advance in NGS is the development of paired-end (PE) library construction, which generates two short reads from a single DNA fragment separated by an insert of a known size. ALLPATHS [9] is a standalone genome assembler It efficiently utilized paired-end information by filling the inner gaps using extension, and suffered much from extensions from one end to the other end of paired-end reads in global graph of reads overlaps. Successive multiple libraries were used in the long march [10] and SubAssembly [11] They used the paradigm of clustering and local assembly, to avoid the repetitive sequences and computing complex in overlap extension. The goal of this study is to fill in the gap between paired-end reads from large DNA fragments (600 or 800 bp), and produce sequences like Sanger reads even when the sequence of gaps is repetitive

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call