Abstract

Background: Numerous completed or on-going whole genome sequencing projects have highlighted the fact that obtaining a high quality genome sequence is necessary to address comparative genomics questions such as structural variations among genotypes and gain or loss of specific function. Despite the spectacular progress that has been made in sequencing technologies, obtaining accurate and reliable data is still a challenge, both at the whole genome scale and when targeting specific genomic regions. These problems are even more noticeable for complex plant genomes. Most plant genomes are known to be particularly challenging due to their size, high density of repetitive elements and various levels of ploidy. To overcome these problems, we have developed a strategy to reduce genome complexity by using the large insert BAC libraries combined with next generation sequencing technologies. Results: We compared two different technologies (Roche-454 and Pacific Biosciences PacBio RS II) to sequence pools of BAC clones in order to obtain the best quality sequence. We targeted nine BAC clones from different species (maize, wheat, strawberry, barley, sugarcane and sunflower) known to be complex in terms of sequence assembly. We sequenced the pools of the nine BAC clones with both technologies. We compared assembly results and highlighted differences due to the sequencing technologies used. Conclusions: We demonstrated that the long reads obtained with the PacBio RS II technology serve to obtain a better and more reliable assembly, notably by preventing errors due to duplicated or repetitive sequences in the same region.

Highlights

  • During the last decade, we have observed remarkable advances in sequencing technology and bioinformatics analysis

  • In the context of various Bacterial Artificial Chromosome (BAC) library screening projects, we sequenced the BAC clones that possess the region of interest using Roche-454 technology

  • We found varying results that ranged from 1 contig per BAC to more than 20, mainly due to the mis-assembly of repeated sequences

Read more

Summary

Introduction

We have observed remarkable advances in sequencing technology and bioinformatics analysis. The sequence length is critical to overcome assembly problems linked to particular features of genomes such as large genome size, high repetitive DNA ratio and various ploidy levels These features are frequently found to be combined in complex genomes of plants [3]. Despite the spectacular progress that has been made in sequencing technologies, obtaining accurate and reliable data is still a challenge, both at the whole genome scale and when targeting specific genomic regions These problems are even more noticeable for complex plant genomes. Most plant genomes are known to be challenging due to their size, high density of repetitive elements and various levels of ploidy To overcome these problems, we have developed a strategy to reduce genome complexity by using the large insert BAC libraries combined with generation sequencing technologies

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call