Abstract
The genomic sequences of many important Triticeae crop species are hard to assemble and analyse due to their large genome sizes, (in part) polyploid genomes and high repeat content. Recently, the draft genomes of barley and bread wheat were reported thanks to cost-efficient and fast NGS technologies. The genome of barley is estimated to be 5 Gb in size whereas the genome of bread wheat accounts for 17 Gb and harbours an allo-hexaploid genome. Direct assembly of the sequence reads and access to the gene content is hampered by the repeat content. As a consequence, novel strategies and data analysis concepts had to be developed to provide much-needed whole genome sequence surveys and access to the gene repertoires. Here we describe some analytical strategies that now enable structuring of massive NGS data generated and pave the way towards structured and ordered sequence data and gene order. Specifically we report on the GenomeZipper, a synteny driven approach to order and structure NGS survey sequences of grass genomes that lack a physical map. In addition, to access and analyse the gene repertoire of allo-hexaploid bread wheat from the raw sequence reads, a reference-guided approach was developed utilizing representative genes from rice, Brachypodium distachyon, sorghum and barley. Stringent sub-assembly on the reference genes prevented collapsing of homeologous wheat genes and allowed to estimate gene retention rate and determine gene family sizes. Genomic sequences from the wheat sub-genome progenitors enabled to discriminate a large number of sub-assemblies between the wheat A, B or D sub-genome using machine learning algorithms. Many of the concepts outlined here can readily be applied to other complex plant and non-plant genomes.
Highlights
The Triticeae tribe comprises some of the most economically important crops including bread wheat, barley and rye
With an estimated genome size of ~5 Gb the barley genome is significantly larger than the human genome, exceeded by the bread wheat genome with ~17 Gb
It has been speculated that the bread wheat genome originated from hybridization between cultivated tetraploid emmer wheat (AABB) and diploid goat grass (DD) about 8000 years ago [5]
Summary
The Triticeae tribe comprises some of the most economically important crops including bread wheat, barley and rye. 454-like shotgun reads were simulated (5× genome coverage), re-mapped against their corresponding OG representatives, sub-assembled with varying minimum overlap identity (97% mi, 99% mi and 100% mi) and, the gene copy number predicted. Wheat sub-assemblies were generated by a stringent assembly of reads mapped to representative (for orthologous groups defined by OrthoMCL [16]) genes from the reference organisms Brachypodium distachyon [13], Hordeum vulgare, Oryza sativa [15] and Sorghum bicolor [14] as well as the genome sequences of the D genome donor species Ae. tauschii [24], and the A genome relative Triticum monococcum (NCBI archive SRP004490.3), and cDNA sequence assemblies from Ae. speltoides (Trick&Bancroft, unpublished data) a member of the Sitopsis section to which the putative B genome donor belongs. The linear ordered gene maps provide a valuable resource for a variety of applications: (i) for marker development and to assist positional cloning [37], (ii) for comparative analyses of the conserved gene space [4], and (iii) to resolve the structure of a genome/chromosome and to establish the colinearity between grass genomes[34,35]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.