Abstract

BackgroundGenomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them.Methodology/Principal FindingsFor Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website.Conclusions/SignificancePlantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further.

Highlights

  • Since the completion of the mapping of the human genome in the 1990’s [1,2], genomics has rapidly matured, and with it, so has the technology used for providing the fundamental sequences for studying genomics: whole genome sequencing

  • The main value in Plantagora rests in its ability to provide tangible information about the sequencing and assembly process that can be used by a researcher to guide their approach to sequencing a plant genome, or other types of genomes

  • The data produced by Plantagora provides guidance for researchers who are planning a whole genome sequencing and assembly project

Read more

Summary

Introduction

Since the completion of the mapping of the human genome in the 1990’s [1,2], genomics has rapidly matured, and with it, so has the technology used for providing the fundamental sequences for studying genomics: whole genome sequencing. The sequencing and annotation of the Arabidopsis genome [3,4] was completed in 2000, and provided an improved genetic landscape for studying all plants. As the genomic research has matured, so has sequencing technology. The new generation of technologies has faster sequencing capability, but limitations in read length. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call