Abstract

BackgroundSampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality.ResultsIn order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS.ConclusionsBy exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-439) contains supplementary material, which is available to authorized users.

Highlights

  • Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality

  • Preparation of high molecular weight (HMW) genomic DNA and ~40 kb inserts Fresh needles from shoots over-wintered since the previous year of the reference tree [7] were collected near Umeå, Sweden during late spring (25th of March, 2010) and immediately frozen at -80°C

  • Optimization of the Fosmid pools (FP) strategy In genomes with a high prevalence of repeats, like that in the spruce, one would expect repeats to be the major reason for assembly termination

Read more

Summary

Introduction

Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. Massively parallel sequencing ( generation) technologies that use short reads have created new challenges in carrying out downstream bioinformatics analyses. One of these challenges is that complexity of the assembly task is much higher as a result of very short read lengths. Bacteriophage lambda packaging restricts the fragment length to ~40 Kbp. Fosmid ends can produce mate-pair (jump) libraries that facilitate the assembly of shotgun genome sequences in the absence of large-scale bacterial cloning [3,4]. The FP approach enables the complexity of downstream bioinformatics analyses to be reduced in a number of ways: 1) each sampled genomic fragment is haploid within a Fosmid [5,6] – assembly of the fragment is not hindered by allelic differences; 2) in repeat-rich genomes, repeats are the major reason for breaks in assembly contiguity, and the repeat assembly problem is heavily reduced when, as exemplified by assembly of the Norway spruce (Picea abies) genome [7], each pool of 1,000 Fosmids contained in total ~40 Mbp genomic regions compared to the challenge to assembly all WGS reads from the entire 20 Gbp genome; 3) it is not necessary to use large-memory computers in order to solve the assembly problem (whereas up to 1 TB of memory is needed to assemble a WGS read set); and

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call