Abstract

In addition to single-nucleotide polymorphisms, structural variation is abundant in many plant genomes. The structural variation across a species can be represented by a ‘pan-genome', which is essential to fully understand the genetic control of phenotypes. However, the pan-genome's complexity hinders its accurate assembly via sequence alignment. Here we demonstrate an approach to facilitate pan-genome construction in maize. By performing 18 trillion association tests we map 26 million tags generated by reduced representation sequencing of 14,129 maize inbred lines. Using machine-learning models we select 4.4 million accurately mapped tags as sequence anchors, 1.1 million of which are presence/absence variations. Structural variations exhibit enriched association with phenotypic traits, indicating that it is a significant source of adaptive variation in maize. The ability to efficiently map ultrahigh-density pan-genome sequence anchors enables fine characterization of structural variation and will advance both genetic research and breeding in many crops.

Highlights

  • In addition to single-nucleotide polymorphisms, structural variation is abundant in many plant genomes

  • Genome duplication[1] and transposable elements[2] (TEs) are important driving forces behind plant genome evolution, and have generated the complex genomes found in many major crop species[3,4,5,6,7]

  • Analysing an unprecedented number of inbred lines in maize, we developed effective genetic mapping approaches combined with ML algorithms to map millions of high-quality sequence anchors for the maize pan-genome

Read more

Summary

Introduction

In addition to single-nucleotide polymorphisms, structural variation is abundant in many plant genomes. Genotyping-by-sequencing (GBS)[16], a reduced representation approach, can efficiently generate abundant single-nucleotide polymorphisms (SNPs) for a large number of individuals of a species It is a cost-effective source of sequence tags that can be used as genetic anchors to direct contig/scaffold assembly and to map genomic fragments absent in the reference. We developed an efficient and accurate approach to genetically map ultrahigh-density sequence anchors, which will be a valuable tool for ongoing pan-genome construction. This approach is most powerful with the large sample size of individuals afforded by GBS. A single reference genome is woefully insufficient to represent all genomic contents for maize

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call