Haplotype-Based Genotyping in Polyploids.

Josh P Clevenger,Peggy Ozias-Akins,Scott Jackson,Walid Korani

doi:10.3389/fpls.2018.00564

Abstract

Accurate identification of polymorphisms from sequence data is crucial to unlocking the potential of high throughput sequencing for genomics. Single nucleotide polymorphisms (SNPs) are difficult to accurately identify in polyploid crops due to the duplicative nature of polyploid genomes leading to low confidence in the true alignment of short reads. Implementing a haplotype-based method in contrasting subgenome-specific sequences leads to higher accuracy of SNP identification in polyploids. To test this method, a large-scale 48K SNP array (Axiom Arachis2) was developed for Arachis hypogaea (peanut), an allotetraploid, in which 1,674 haplotype-based SNPs were included. Results of the array show that 74% of the haplotype-based SNP markers could be validated, which is considerably higher than previous methods used for peanut. The haplotype method has been implemented in a standalone program, HAPLOSWEEP, which takes as input bam files and a vcf file and identifies haplotype-based markers. Haplotype discovery can be made within single reads or span paired reads, and can leverage long read technology by targeting any length of haplotype. Haplotype-based genotyping is applicable in all allopolyploid genomes and provides confidence in marker identification and in silico-based genotyping for polyploid genomics.

Highlights

The identification of functional variation controlling traits of interest relies on the ability to discern all true variation between accessions with discrete genotypes
For Single nucleotide polymorphisms (SNPs) identification, a set of 21 Arachis hypogaea accessions were re-sequenced to 10X coverage (Clevenger et al, 2017b) and sequences from three accessions that are parents of two RIL populations [“T” (Qin et al, 2012) and “S” (Khera et al, 2016)] were used
When utilizing sequence-based genotyping, short reads originating from each subgenome can both map to the same duplicate location

Summary

Introduction

The identification of functional variation controlling traits of interest relies on the ability to discern all true variation between accessions with discrete genotypes. The size and complexity of polyploid genomes have led to the reliance on single nucleotide polymorphism (SNP) arrays and complexity reduction sequencing strategies such as genotypingby-sequencing (GBS) and restriction site-associated DNA sequencing (RADSeq; Elshire et al, 2011; Willing et al, 2011). These methodologies have allowed access to unprecedented number of markers for genomics. Since the SNP probes on arrays are static, rare variants or subpopulation-specific variants will not be assayed This will cause bias in population genetics studies, and will not allow the identification of rare functional variants controlling traits of interest. A method of identifying markers straight from sequence data alleviates ascertainment bias on an experiment-wise level by providing access to all potential polymorphisms in the population of interest and does not constrain analysis to discrete markers on an array

Methods

Results

Conclusion