Abstract

In statistical data analysis, penalized regression is considered an attractive approach for its ability of simultaneous variable selection and parameter estimation. Although penalized regression methods have shown many advantages in variable selection and outcome prediction over other approaches for high-dimensional data, there is a relative paucity of the literature on their applications to hypothesis testing, e.g., in genetic association analysis. In this study, we apply several new penalized regression methods with a novel penalty, called Truncated L1-penalty (TLP) (Shen et al., 2012), for either variable selection, or both variable selection and parameter grouping, in a data-adaptive way to test for association between a quantitative trait and a group of rare variants. The performance of the new methods are compared with some existing tests, including some recently proposed global tests and penalized regression-based methods, via simulations and an application to the real sequence data of the Genetic Analysis Workshop 17 (GAW17). Although our proposed penalized methods can improve over some existing penalized methods, often they do not outperform some existing global association tests. Some possible problems with utilizing penalized regression methods in genetic hypothesis testing are discussed. Given the capability of penalized regression in selecting causal variants and its sometimes promising performance, further studies are warranted.

Highlights

  • Genome-wide association studies (GWAS) have uncovered many common variants (CVs) associated with complex diseases, but the proportion of variance explained by the identified CVs is often low (Maher, 2008)

  • Differing from the usual application of penalized regression methods to variable selection or risk prediction for high-dimensional data (Kooperberg et al, 2010; Austin et al, 2013), here we focus on their application to hypothesis testing on a quantitative trait in a relatively low-dimensional setting

  • The vector is dichotomized to yield a haplotype with the minor allele frequency (MAF) of each variant randomly chosen between 0.005 and 0.01

Read more

Summary

Introduction

Genome-wide association studies (GWAS) have uncovered many common variants (CVs) associated with complex diseases, but the proportion of variance explained by the identified CVs is often low (Maher, 2008). We propose applying some new penalized regression methods to test for association between a quantitative trait and multiple RVs. Differing from the usual application of penalized regression methods to variable selection or risk prediction for high-dimensional data (Kooperberg et al, 2010; Austin et al, 2013), here we focus on their application to hypothesis testing on a quantitative trait in a relatively low-dimensional setting. To avoid the large DF and to aggregate information across multiple RVs, one common strategy is to pool or collapse multiple RVs in a region or gene (Li and Leal, 2008; Madsen and Browning, 2009) One such attempt is the Sum test (Pan, 2009), which was developed to utilize joint effects of multiple variants while reducing the DF. The Sum test and many burden tests perform poorly if www.frontiersin.org

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call