Abstract

Efficient utilization of genetic variation in plant germplasm collections is impeded by large collection size, uneven characterization of traits, and unpredictable apportionment of allelic diversity among heterogeneous accessions. Distributing compact subsets of the complete collection that contain maximum allelic diversity at functional loci of interest could streamline conventional and precision breeding. Using heterogeneous population samples from Arabidopsis, Populus and sorghum, we show that genomewide single nucleotide polymorphism (SNP) data permits the capture of 3–78 fold more haplotypic diversity in subsets than geographic or environmental data, which are commonly used surrogate predictors of genetic diversity. Using a large genomewide SNP data set from landrace sorghum, we demonstrate three bioinformatic approaches to extract functional genetic diversity. First, in a “candidate gene” approach, we assembled subsets that maximized haplotypic diversity at 135 putative lignin biosynthetic loci, relevant to biomass breeding programs. Secondly, we applied a keyword search against the Gene Ontology to identify 1040 regulatory loci and assembled subsets capturing genomewide regulatory gene diversity, a general source of phenotypic variation. Third, we developed a machine-learning approach to rank semantic similarity between Gene Ontology term definitions and the textual content of scientific publications on crop adaptation to climate, a complex breeding objective. We identified 505 sorghum loci whose defined function is semantically-related to climate adaptation concepts. The assembled subsets could be used to address climatic pressures on sorghum production. To face impending agricultural challenges and foster rapid extraction and use of novel genetic diversity resident in heterogeneous germplasm collections, whole genome resequencing efforts should be prioritized.

Highlights

  • Plant germplasm collections safeguarded in gene banks conserve the raw materials necessary to confront agricultural challenges

  • Whether natural or artificial, segregating DNA sequence variation consists of haplotype blocks—contiguous spans of sequence that are inherited as a unit

  • Genomewide single nucleotide polymorphism (SNP) data was acquired from published studies on wild European Arabidopsis thaliana (L.) Heynh., a model species, wild North American Populus trichocarpa Torr. & Gray, a source of wood fiber, and landrace African Sorghum bicolor (L.) Moench., a grain/feedstock commonly cultivated in arid areas [26,27,28]

Read more

Summary

Introduction

Plant germplasm collections safeguarded in gene banks conserve the raw materials necessary to confront agricultural challenges. Gene banks conserve DNA sequence variation packaged into reproductive propagules. This DNA sequence variation forms the material basis of the potential phenotypic variation available in a collection that can be used to address challenges. Extraction of improved or novel traits from a collection depends on the mobilization of haplotype blocks covering a desired set of genes into breeding lines, and eventually, to elite cultivars. This process has been aided by use of “core collections”, subsets of the broader collection designed to contain maximum genetic variation in a compact number of accessions [8]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call