Abstract
Key messageSoftware for high imputation accuracy in soybean was identified. Imputed dataset could significantly reduce the interval of genomic regions controlling traits, thus greatly improve the efficiency of candidate gene identification.Genotype imputation is a strategy to increase marker density of existing datasets without additional genotyping. We compared imputation performance of software BEAGLE 5.0, IMPUTE 5 and AlphaPlantImpute and tested software parameters that may help to improve imputation accuracy in soybean populations. Several factors including marker density, extent of linkage disequilibrium (LD), minor allele frequency (MAF), etc., were examined for their effects on imputation accuracy across different software. Our results showed that AlphaPlantImpute had a higher imputation accuracy than BEAGLE 5.0 or IMPUTE 5 tested in each soybean family, especially if the study progeny were genotyped with an extremely low number of markers. LD extent, MAF and reference panel size were positively correlated with imputation accuracy, a minimum number of 50 markers per chromosome and MAF of SNPs > 0.2 in soybean line were required to avoid a significant loss of imputation accuracy. Using the software, we imputed 5176 soybean lines in the soybean nested mapping population (NAM) with high-density markers of the 40 parents. The dataset containing 423,419 markers for 5176 lines and 40 parents was deposited at the Soybase. The imputed NAM dataset was further examined for the improvement of mapping quantitative trait loci (QTL) controlling soybean seed protein content. Most of the QTL identified were at identical or at similar position based on initial and imputed datasets; however, QTL intervals were greatly narrowed. The resulting genotypic dataset of NAM population will facilitate QTL mapping of traits and downstream applications. The information will also help to improve genotyping imputation accuracy in self-pollinated crops.
Highlights
In modern breeding programs, germplasm is frequently required to be genotyped with mega- or giga-sized sets of single nucleotide polymorphism (SNP) markers
The objectives of this study were to evaluate imputation performance of the three commonly used imputation software, BEAGLE, IMPUTE and AlphaPlantImpute in soybean populations considering a number of factors including the number of markers in the study panel, extent of linkage disequilibrium (LD), minor allele frequency (MAF) of markers and genetic map distance vs. physical distance, to generate soybean Nested association mapping (NAM) recombinant inbred line (RIL) imputed genotype dataset with optimized software parameters for public utilization and to demonstrate quantitative trait loci (QTL) mapping improvement based on the imputed RILs dataset vs. original dataset in linkage mapping analysis
For imputation of 5 and 160 markers per chromosome in study panels performed by BEAGLE 5.0, the accuracy increased by 11.30% and 1.04% when filtered with genotype probability (GP) > 0.9 versus without GP filtering
Summary
Germplasm is frequently required to be genotyped with mega- or giga-sized sets of single nucleotide polymorphism (SNP) markers. BEAGLE 5.0 uses haplotype frequency model described by Li and Stephens (2003) with a highly parsimonious algorithm to construct a small subset of reference haplotype from a full reference panel for imputation, which enables to the use of large reference panels with a significant reduction in computational cost in imputation (Browning et al 2018) It is a more computationally intensive imputation method, the current version of IMPUTE 5 is greatly improved in speed, accuracy and memory efficiency by using new reference panel file format and haplotype-selecting strategy based on the Positional Burrows Wheeler Transform (PBWT) (Rubinacci et al 2019). Soybean is an inbred crop with relatively low genetic diversity and a long stretch of related haplotypes, especially in the bi-parental derived populations Both software models have been widely used in animal and plant genetics, parameters affecting the size of haplotype cluster in the study panel of inbred plant like soybean need to be optimized. Other tools designed to integrate GBS data from bi-parental populations in plants, including Tassel-FSFHap (Swarts et al 2014), LB-impute (Fragoso et al 2016), and NOISYmputer (Lorieux et al 2019), are available
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.