Comparing Different Statistical Models and Multiple Testing Corrections for Association Mapping in Soybean and Maize

Avjinder S Kaler,Timothy Beissinger,Jason D Gillman,Larry C Purcell

doi:10.3389/fpls.2019.01794

Abstract

Association mapping (AM) is a powerful tool for fine mapping complex trait variation down to nucleotide sequences by exploiting historical recombination events. A major problem in AM is controlling false positives that can arise from population structure and family relatedness. False positives are often controlled by incorporating covariates for structure and kinship in mixed linear models (MLM). These MLM-based methods are single locus models and can introduce false negatives due to over fitting of the model. In this study, eight different statistical models, ranging from single-locus to multilocus, were compared for AM for three traits differing in heritability in two crop species: soybean (Glycine max L.) and maize (Zea mays L.). Soybean and maize were chosen, in part, due to their highly differentiated rate of linkage disequilibrium (LD) decay, which can influence false positive and false negative rates. The fixed and random model circulating probability unification (FarmCPU) performed better than other models based on an analysis of Q-Q plots and on the identification of the known number of quantitative trait loci (QTLs) in a simulated data set. These results indicate that the FarmCPU controls both false positives and false negatives. Six qualitative traits in soybean with known published genomic positions were also used to compare these models, and results indicated that the FarmCPU consistently identified a single highly significant SNP closest to these known published genes. Multiple comparison adjustments (Bonferroni, false discovery rate, and positive false discovery rate) were compared for these models using a simulated trait having 60% heritability and 20 QTLs. Multiple comparison adjustments were overly conservative for MLM, CMLM, ECMLM, and MLMM and did not find any significant markers; in contrast, ANOVA, GLM, and SUPER models found an excessive number of markers, far more than 20 QTLs. The FarmCPU model, using less conservative methods (false discovery rate, and positive false discovery rate) identified 10 QTLs, which was closer to the simulated number of QTLs than the number found by other models.

Highlights

Connecting genotype to phenotype, known as genetic mapping, is important for modern crop breeding and improvement (Mackay, 2001)
analysis of variance (ANOVA), general linear model (GLM), and settlement of MLM under progressively exclusive relationship (SUPER) models had an inflation of P-values indicating there were a large number of false positives whereas mixed linear models (MLM), Compressed MLM (CMLM), Enriched CMLM (ECMLM), and multiple loci MLM (MLMM) controlled false positives but not false negatives
Based on the Q-Q plots and the number of known simulated QTLs, the fixed and random model circulating probability unification (FarmCPU) was an appropriate model for controlling false positives and false negatives compared to other models

Summary

Introduction

Connecting genotype to phenotype, known as genetic mapping, is important for modern crop breeding and improvement (Mackay, 2001). AM is an alternative approach to traditional mapping of biparental populations and is currently widely used in plant, animal (Goddard and Hayes, 2009), model species (Brachi et al, 2010), and human genetics (Risch and Merikangas, 1996; Nordborg and Tavaré, 2002). Most important traits in plants are complex and controlled by many genes and influenced by environment. With advancements in high throughput genotyping and sequencing technologies, single nucleotide polymorphisms (SNPs) provide relatively low cost and dense marker coverage across various genomes (Syvänen, 2005). Genotyping diverse lines provides thousands of SNPs across the genome that enables fine mapping complex trait variation down to nucleotide sequences by exploiting historical recombination events (Zhu et al, 2008). AM has lower overall statistical power to detect rare alleles and epistatic interactions than traditional LM, but it has several advantages, which include increased mapping resolution, broader allele coverage, reduced time and cost compared to developing biparental mapping populations, and potentially greater number of alleles evaluated (Yu et al, 2006)

Objectives

Methods

Results

Discussion

Conclusion