Abstract

BackgroundWhite clover (Trifolium repens L.) is an allotetraploid species possessing two highly collinear ancestral sub-genomes. The apparent existence of highly similar homeolog copies for the majority of genes in white clover is problematic for the development of genome-based resources in the species. This is especially true for the development of genetic markers based on single nucleotide polymorphisms (SNPs), since it is difficult to distinguish between homeolog-specific and allelic variants. Robust methods for categorising single nucleotide variants as allelic or homeolog-specific in large transcript datasets are required. We illustrate one potential approach in this study.ResultsWe used 454-pyrosequencing sequencing to generate ~760,000 transcript sequences from an 8th generation white clover inbred line. These were assembled and partially annotated to yield a reference transcript set comprising 71,545 sequences. We subsequently performed Illumina sequencing on three further white clover samples, generating 14 million transcript reads from a mixed sample comprising 24 divergent white clover genotypes, and 50 million reads on two further eighth generation white clover inbred lines. Mapping these reads to the reference transcript set allowed us to develop a significant SNP resource for white clover, and to partition the SNPs from the inbred lines into categories reflecting allelic or homeolog-specific variation. The potential for using haplotype reconstruction and progenitor genome comparison to assign haplotypes to specific ancestral sub-genomes of white clover is demonstrated for sequences corresponding to genes encoding dehydration responsive element binding protein and acyl-coA oxidase.ConclusionsIn total, 208,854 independent SNPs in 31,715 reference sequences were discovered, approximately three quarters of which were categorised as representing allelic or homeolog-specific variation using two inbred lines. This represents a significant resource for white clover genomics and genetics studies. We discuss the potential to extend the analysis to identify a “core set” of ancestrally derived homeolog specific variants in white clover.

Highlights

  • White clover (Trifolium repens L.) is an allotetraploid species possessing two highly collinear ancestral sub-genomes

  • White clover varieties are generated via polycrosses of multiple parents and are relatively genetically heterogeneous (Grasslands Huia is a synthetic variety derived from seven individual parents [8]) leading to an expectation of multiple allelic variants for individual loci in the consensus sequences generated by clustering

  • Partially annotated a reference transcript set of over 70,000 sequences for white clover, and used these to identify over 200,000 independent single nucleotide polymorphisms (SNPs) in approximately 45% of the reference transcript sequences

Read more

Summary

Introduction

White clover (Trifolium repens L.) is an allotetraploid species possessing two highly collinear ancestral sub-genomes. The apparent existence of highly similar homeolog copies for the majority of genes in white clover is problematic for the development of genome-based resources in the species This is especially true for the development of genetic markers based on single nucleotide polymorphisms (SNPs), since it is difficult to distinguish between homeolog-specific and allelic variants. On validating SNPs from a small subset of these loci in mapping populations, almost half of the SNP assays generated monomorphic patterns in F1 progeny despite prior validation as polymorphic markers [10] The authors attributed this to the frequent clustering of sequences from highly similar homeolog copies of genes in the consensus sequences generated in the earlier part of the process, and a subsequent inability to efficiently distinguish between allelic and homeolog specific variation amongst the clustered sequences when designing assays. Basing SNP assays on homeolog specific, rather than allelic variants, results in monomorphic assays because alternative homeolog variants are ubiquitously present in segregating populations

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call