Abstract

Short insertions, deletions (INDELs) and larger structural variants have been increasingly employed in genetic association studies, but few improvements over SNP-based association have been reported. In order to understand why this might be the case, we analysed two publicly available datasets and observed that 63% of INDELs called in A. thaliana and 64% in D. melanogaster populations are misrepresented as multiple alleles with different functional annotations, i.e. where the same underlying variant is represented by inconsistent alignments leading to different variant calls. To address this issue, we have developed the software Irisas to reclassify and re-annotate these variants, which we then used for single-locus tests of association. We also integrated them to predict the functional impact of SNPs, INDELs, and structural variants for burden testing. Using both approaches, we re-analysed the genetic architecture of complex traits in A. thaliana and D. melanogaster. Heritability analysis using SNPs alone explained on average 27% and 19% of phenotypic variance for A. thaliana and D. melanogaster respectively. Our method explained an additional 11% and 3%, respectively. We also identified novel trait loci that previous SNP-based association studies failed to map, and which contain established candidate genes. Our study shows the value of the association test with INDELs and integrating multiple types of variants in association studies in plants and animals.

Highlights

  • Identifying the causal loci underlying phenotypic variance is a fundamental biological challenge

  • We showed by simulation that multiple independent loss-offunction common-allele single-nucleotide polymorphisms (SNP)/insertions or deletions (INDELs) were challenging for direct association tests but demonstrated that they could be recovered by integrated burden testing

  • By comparing with a long read (Pacific Biosciences) based de novo assembly of the A. thaliana accession Ler-0 [16], we estimated that 3.1% of variants were incorrectly called and a further 2.3% of variants were mistakenly called as reference for accessible regions [15, 17] (S1 Text)

Read more

Summary

Introduction

Identifying the causal loci underlying phenotypic variance is a fundamental biological challenge. In A. thaliana, a 16bp insertion and 345bp complex deletion (where 376bp have been replaced with 31bp) in the FRIGIDA (FRI) gene both have been linked to flowering time [8], an important adaptive trait in plants. When these two variants were genotyped using dideoxy sequencing and added to a panel of array-genotyped SNPs for genome-wide significance test, neither of them reached genome-wide significance in a traditional GWAS [9].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call