Abstract

BackgroundWith the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated.Methodology/Principal FindingIntegrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ∼ 98.28% and 89.02% ∼ 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly.ConclusionThe quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published.

Highlights

  • Whole genome sequencing is a technique that can determine complete DNA sequences of organisms ranging from chromosomal, mitochondrial, and chloroplast DNA

  • The next-generation sequencing (NGS) platform can produce a large number of short reads (25–70 bp), which can be assembled into contigs sequences using sequence overlap information

  • Using the paired-end information to join the unique contigs into scaffolds, the single or multiplescaffolds might represent individual chromosomes [52], [53]

Read more

Summary

Introduction

Whole genome sequencing is a technique that can determine complete DNA sequences of organisms ranging from chromosomal, mitochondrial, and chloroplast DNA (in plant). DNA sequencing technology has undergone three stages of development, namely the Sanger, next-generation, and thirdgeneration sequencing methodologies. Three ‘‘next-generation sequencing (NGS)’’ platforms including the Roche/454 Genome Sequencer (http:// www.454.com) [4], Illumina/Solexa Genome Analyzer II (http:// www.Illumina.com) [5], [6] and Applied Biosystems SOLiD. System (http://www.solid.appliedbiosystems.com) [7] were introduced and made commercially available. These platforms have been widely used in many genome sequencing projects. With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Genome sequence qualities of multiple plants have not been evaluated

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call