Abstract

BackgroundStandard strategies to identify genomic regions involved in a specific trait variation are often limited by time and resource consuming genotyping methods. Other limiting pre-requisites are the phenotyping of large segregating populations or of diversity panels and the availability and quality of a closely related reference genome. To overcome these limitations, we designed efficient Comparative Subsequence Sets Analysis (CoSSA) workflows to identify haplotype specific SNPs linked to a trait of interest from Whole Genome Sequencing data.ResultsAs a model, we used the resistance to Synchytrium endobioticum pathotypes 2, 6 and 18 that co-segregated in a tetraploid full sib population. Genomic DNA from both parents, pedigree genotypes, unrelated potato varieties lacking the wart resistance traits and pools of resistant and susceptible siblings were sequenced. Set algebra and depth filtering of subsequences (k-mers) were used to delete unlinked and common SNPs and to enrich for SNPs from the haplotype(s) harboring the resistance gene(s). Using CoSSA, we identified a major and a minor effect locus. Upon comparison to the reference genome, it was inferred that the major resistance locus, referred to as Sen3, was located on the north arm of chromosome 11 between 1,259,552 and 1,519,485 bp. Furthermore, we could anchor the unanchored superscaffold DMB734 from the potato reference genome to a synthenous interval. CoSSA was also successful in identifying Sen3 in a reference genome independent way thanks to the de novo assembly of paired end reads matching haplotype specific k-mers. The de novo assembly provided more R haplotype specific polymorphisms than the reference genome corresponding region. CoSSA also offers possibilities for pedigree analysis. The origin of Sen3 was traced back until Ora. Finally, the diagnostic power of the haplotype specific markers was shown using a panel of 56 tetraploid varieties.ConclusionsCoSSA is an efficient, robust and versatile set of workflows for the genetic analysis of a trait of interest using WGS data. Because the WGS data are used without intermediate reads mapping, CoSSA does not require the use of a reference genome. This approach allowed the identification of Sen3 and the design of haplotype specific, diagnostic markers.

Highlights

  • Standard strategies to identify genomic regions involved in a specific trait variation are often limited by time and resource consuming genotyping methods

  • The pathotype 18 scores were skewed towards susceptibility (χ2 test p-value < 0.001; Additional file 5C) which can be explained by a weaker resistance of Kuba to pathotype 18

  • The resistant locus was mapped to a region corresponding to a 777 kb interval of the reference genome (939,581 bp and 1,716,722 bp) which overlaps with the 260 kb interval we identified in this study

Read more

Summary

Introduction

Standard strategies to identify genomic regions involved in a specific trait variation are often limited by time and resource consuming genotyping methods. Other limiting pre-requisites are the phenotyping of large segregating populations or of diversity panels and the availability and quality of a closely related reference genome. To overcome these limitations, we designed efficient Comparative Subsequence Sets Analysis (CoSSA) workflows to identify haplotype specific SNPs linked to a trait of interest from Whole Genome Sequencing data. The frequency of non-linked loci for the trait of interest are expected to be equivalent between the pools whereas a bias in the frequency of the loci linked to the causal genes is expected This way, DNA sequence variants linked to the trait of interest are identified, which allows the development of markers for genetic mapping and Marker Assisted Selection (MAS) purposes

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call