Abstract

This paper mainly utilizes likelihood-based tests to detect rare variants associated with a continuous phenotype under the framework of kernel machine learning. Both the likelihood ratio test (LRT) and the restricted likelihood ratio test (ReLRT) are investigated. The relationship between the kernel machine learning and the mixed effects model is discussed. By using the eigenvalue representation of LRT and ReLRT, their exact finite sample distributions are obtained in a simulation manner. Numerical studies are performed to evaluate the performance of the proposed approaches under the contexts of standard mixed effects model and kernel machine learning. The results have shown that the LRT and ReLRT can control the type I error correctly at the given α level. The LRT and ReLRT consistently outperform the SKAT, regardless of the sample size and the proportion of the negative causal rare variants, and suffer from fewer power reductions compared to the SKAT when both positive and negative effects of rare variants are present. The LRT and ReLRT performed under the context of kernel machine learning have slightly higher powers than those performed under the context of standard mixed effects model. We use the Genetic Analysis Workshop 17 exome sequencing SNP data as an illustrative example. Some interesting results are observed from the analysis. Finally, we give the discussion.

Highlights

  • For next-generation sequencing data identifying rare variants associated with phenotypes of interest is both practically and theoretically important [1,2,3]

  • The likelihood ratio test (LRT) and restricted likelihood ratio test (ReLRT) performed under the context of kernel machine learning have slightly higher powers than those performed under the context of standard mixed effects model

  • In this paper we have proposed the LRT and ReLRT to detect the rare variants associated with complex phenotypes from both the standard mixed effects model framework and the kernel

Read more

Summary

Introduction

For next-generation sequencing data identifying rare variants associated with phenotypes of interest is both practically and theoretically important [1,2,3]. The rare variant is typically defined as allele with minor allele frequency (MAF) less than 1%. The past few years have witnessed increasing evidence that the rare variants play an important role in many complex diseases and disorders [4,5,6,7,8,9,10,11,12,13,14,15,16]. There are some other findings supporting the contributions of rare variants to the diseases. According to the odds ratio (OR) distribution, it has been demonstrated that most rare variants have values above 2 and the mean OR is 3.74, while very few common variants (defined as MAF.1%) have values above 2 and the mean OR is 1.36 [17]. See Box 1 in Cirulli and Goldstein [2]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call