Abstract
Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.
Highlights
Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination
We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion
We conclude that the frequency with which loci juxtapose in three-dimensional space is predominantly determined by their position in the linear genome. This is in sharp contrast to the organization of chromatin in human nuclei where two compartments corresponding to open and closed chromatin domains are evident at megabase resolution[20], but is consistent with cytogenetic mapping of histone marks associated with heterochromatin in large, repeat-rich genomes[29]
Summary
4,235 (4.58 Gb) 2,123 (205 Mb) 1.9 Mb 4.63 Gb (97%) 4.54 Gb (95%) 39,734 65.3 Mb (1.4%) 3.70 Gb (80.8%). Overlaps between adjacent clones[15] were detected and validated by physical map information[16], a genetic linkage[17] and a highly contiguous optical map[18] to construct super-scaffolds composed of merged assemblies of individual BACs (Table 1 and Extended Data Table 1) This increased the contiguity as measured by the N50 value (the scaffold size above which 50% of the total length of the sequence was included in the assembly) from 79 kb to 1.9 Mb. Scaffolds were assigned to chromosomes using a population sequencing (POPSEQ) genetic map[17]. Mapping of transcriptome data and reference protein sequences from other plant species to the assembly identified 83,105 putative gene loci including protein-coding genes, non-coding RNAs, pseudogenes and transcribed transposons (Fig. 1, Extended Data Fig. 1, Extended Data Table 2 and Supplementary Note 3). 20-Mer frequency (median) 14.6–117 Age full-length LTRs (Myr) m50 1.4–2.4 Genes (number per Mb) 2.1–29.3 Recombination rate (cM per Mb) 0–1.7 GC content (%) 43.9–45.0 b
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have