Abstract

In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed.

Highlights

  • In spite of the biological and economic importance of plants, relatively few plant species have been sequenced

  • The increase in plant sequence data has prompted the development of dedicated repositories, such as the general purpose Phytozome [13], the comparative plant genomics resource PLAZA [39], plant family databases such as TreeGenes for forest tree genome data [40], or species specific databases (e.g., EuroPineDB for maritime pine [41], Eucawood for Eucalyptus [42], or MeloGen for Cucumis melo [43]). It is worth mentioning the iPlant project [44], which emerged with the aim of creating an innovative, comprehensive and foundational cyber infrastructure to support plant biology research, the VirtualPlant platform [45], integrating genome-wide data on the known and predicted relationships among genes, proteins, and molecules in order to enable scientists to visualize, integrate, and analyze genomic data from a systems biology perspective or the Plantagora platform [34], which addresses the gap between having the technical tools for plant genome sequencing and knowing precisely the best way to use them

  • This high coverage will not resolve the concern about repeats but it is required to compensate the effective shorter length and sequencing errors of next-generation sequencing (NGS) technologies, which increase assembly complexity and intensify computational issues related to large datasets

Read more

Summary

Introduction

+LJKHU SODQWV DUH WKH (DUWK¶V GRPLQDQW YHJHWDWLRQ LQ QHDUO\ DOO HFRsystems. They sustain living beings (including humans) by providing oxygen, food, fiber, fuel, medicines, spirits, erosion defense, flooding control, soil regeneration, (bio)remediation, urban cooling, green spaces (including gardens) and CO2 lowering, and contributing to the control of global warming [1]. The publication of the first plant genome sequence of Arabidopsis thaliana [4] provided and improved the genetic landscape for studying all plants and has paved the way for sequencing several other plant genomes. It has transformed the methods and tools for plant research and crop improvement [5]. The arrival of next-generation sequencing (NGS) technologies has allowed the rapid and efficient development of genomic resources for non-model or orphan plant species [9±13]. We discuss why challenges are not overcome by the arrival of NGS

From Sanger Technology to NGS
Challenging Features of Plant Genomes
Sampling
Genome Size and Complexity
Transposable Elements
Heterozygosity
Polyploidy
Gene Content and Gene Families
Non-Coding RNAs
Confounding Factors for Plant Genome Assembly
Repetitive Nature of Plant Genomes
DNA Contamination
Sequencing Errors
Read Length
Quality Values
Number of Reads and Coverage
Seeking for the Best Assembly
Findings
Concluding Remarks

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.